This article introduces the Data science and machine learning for fraud detection, analyzing past data and identifying new patterns. Businesses need to analyze threats, collect and analyze data to build effective fraud detection models.

According to the Association of Professionals and Directors of Accounting and Management (APDC), internal fraud costs French companies. 5% of their turnover.

In the same way as external fraud, such as cyberattacks, businesses are still poorly equipped to deal with the risk of internal fraud: 6 out of 10 businesses have not allocated a specific budget to fight fraud (Euler Hermes and DFCG study of 2020).

How can businesses protect themselves from this threat?

Let's start at the beginning

Where does fraud come from? Or, why is an individual cheating?

In the 1960s, the American criminologist Donald Cressey invented the theory of” Fraud triangle ”. According to his theory, an employee, manager, or director can be driven to fraud for three reasons:

  • The pressure : often financial, such as a high lifestyle or addictions, but sometimes coming from the company via unattainable goals for example.
  • The opportunity : weaknesses in the company's internal control.
  • rationalization : a reflection justifying and making the act of fraud acceptable (“everyone does it”, “the amount is ridiculous compared to the company's income”,...)

Faced with this, the first reaction of businesses: prevention

To deal with the risk of fraud, the first defense is of course Prevention.

The first of the preventive measures is to map threats and to draw up a clear action plan that will be relayed within the company. The essential point for a prevention policy to work is to obtain the participation of the greatest number of people and to distill their practices into the corporate culture.

However, this is not enough

Prevention must be accompanied by internal controls and audits as well as warning systems. However, manual controls are long and often ineffective : an approving manager will tend to skim over expense report validations or small transactions for example, these tasks being far from being among his daily priorities. As for audits, the analysis of thousands of lines of data by teams of auditors is very long and especially expensive.

That's where data science comes in

Data science, and in particular the Machine learning, are already widely used today for issues such as the detection of spam, medical diagnosis or even the recommendation of media content. Machine learning consists ofdata analysis to understand, predict, and classify information. This technology is therefore ideal for fraud detection.

Indeed, it makes it possible to analyze past frauds, identify their origin and the environment in which they are committed. Thus, teams can take action and apply new automatic controls or points of vigilance to remind employees as part of the prevention plan.

Concretely, how does it work?

There are two main families of machine learning algorithms: supervised models and unsupervised models. Before going any further, it is good to point out that the application of one or other of these families of models does not give a firm answer to the presence or absence of fraud but provides a probability of facing fraud ('This operation has an 86% chance of being a fraudulent operation' for example).

Les supervised models learn on the basis of so-called 'labelled', i.e. containing the expected result. For example, a data set of accounting transactions with the mention fraudulent or not for each line. The model then learns to recognize the similar fraudulent situations to the one that was shown to him. This method is effective in detecting fraud that is similar to past situations.

The limit of this solution is Creativity of the fraudster. Indeed, when it comes to fraud, the adaptation and renewal of techniques are essential.

We then use the unsupervised models. In this case, the model is trained with a data set without labels. He will then have to look for similarities or unusual behaviors in the data by himself. Thus, the unsupervised model will make it possible to discover new types of fraud by decrypting large volumes of information that cannot be analyzed manually.

By combining the two, it is then possible to monitor and report risks on an ongoing basis of referenced frauds while detecting new threat sources within the organization.

Other techniques can of course complement these methods: automatic checks of operations based on business rules, a system for scoring the risk of teams or services to prioritize preventive and control actions, data sampling to limit manual control to the most risky operations,...

Where do I start for my business?

  1. Analyze threats and map risks.
  2. You will thus be able to identify white areas and develop a prevention plan..
  3. Your data: GATHER, STRUCTURE and ANALYZE THEM

From there, you can build effective models to:

  • Detect Rapidly committed fraud
  • Identify The sources of threats
  • Anticipate and take appropriate measures

Latest blog posts

Discover our articles on the latest trends, advances, or applications of AI today.

Caroline
Data Scientist
Aqsone
Squad Com'
Technical

Introduction to Retrieval Augmented Generation (RAG)

Learn more
Louis
Data Scientist
Aqsone
Squad Com'
Technical

Interpretability of LLMs: The Role of Sparse Autoencoders

Learn more
Diane
Business Developer
Aqsone
Squad Com'
Innovation

Artificial Intelligence in Industrial Procurement

Learn more