Publication

Machine learning for fraud detection

Van der Schraelen, Lennert
Citations
Altmetric:
Publication Type
Dissertation - Collection of articles
Editor
Supervisor
Stouthuysen, Kristof
Publication Year
2024
Journal
Book
Publication Volume
Publication Issue
Publication Begin page
Publication End page
Publication Number of pages
164
Abstract
Developing methodologies that can optimally model and detect fraud is of utmost importance, as undetected/unprevented fraud negatively impacts multiple stakeholders. We aim to address three major issues researchers and decision-makers encounter when developing such models: model overconfidence, bias and inconsistency. (Chapter 1: overconfidence) Many machine learning models make overconfident predictions outside the range of the training data, which severely negatively impacts the deployment and usefulness of these models in real life. This is a major issue in the field of fraud detection when selecting false positives wastes your limited resources. Furthermore, it causes decision-makers to lose confidence in the model as out-of-distribution predictions are not substantiated. In this paper, we develop machine learning models by assigning predetermined non-uniform class probabilities outside the training data, which positively affects the model's behavior and performance. (Chapter 2: bias) Traditional statistical methods and newer machine learning methods are used to identify predictors of financial misconduct periods. However, the partial observability of committed financial misconduct biases these prior findings. That is, it is crucial to not only consider misconduct firms labeled by the labeling mechanism but also account for unlabeled financial misconduct. In this paper, we use machine learning methods incorporating modeling partial observability by exploiting new and existing features to capture the labeling propensity. We show that our methodology improves the detection of future misconduct and identifies the predictors significantly affecting labeling propensity. By modeling partial observability, we aim to model all firms participating in financial misconduct instead of merely focusing on those labeled by the labeling mechanism. (Chapter 3: inconsistency) Fraud investigators under constrained resources cannot thoroughly examine every case. Therefore, such stakeholders should prioritize metrics that capture the benefit of flagging financial misconduct while limiting the cost of falsely accusing legitimate firms among a group of selected cases. However, the employed detection model is often not optimized and validated on these metrics, leading to subpar performance. This paper constructs customized financial misconduct detection models by optimizing suitable cost-sensitive performance metrics rather than relying on an ad-hoc approach. We illustrate that our methodology improves the economic validity of financial misconduct models on various financial misconduct proxies and cost structures.
Research Projects
Organizational Units
Journal Issue
Keywords
Fraud Detection, Data Science, Machine Learning, Oversampling, PU-Learning, Cost-Sensitive Learning
Citation
Knowledge Domain/Industry
DOI
Other links
Embedded videos