Détection de Fraude en Assurance Automobile

Abstract #

Fraud has become a major problem in the insurance industry. Many resources are invested

to identify and prevent it. Traditional approaches to identifying fraud are rule-based, which

means that strict and rapid rules for reporting fraudulent claims must be established manually

and in advance. But these techniques are no longer effective because of the increase of data

quantity to be processed and the variety of its types. For this, the techniques of Data Science

present the obvious solution for this problem.

In this regard, a client of the host company “DEVOTEAM”, which operates in the insurance

field, decided to invest in Data Science, in order to create a solution for automobile insurance

fraud detection.

This report summarizes the work aimed at the realization of a fraud detection solution in

automobile insurance. This solution uses three types of data, namely structured data, images of

damaged cars and accident description texts.

Before the realization of the project, we started with a documentation relating to the various

techniques of resolution of the problem, namely the algorithms of supervised and unsupervised

learning and the algorithms of deep learning. Then, we started to design the different models

for each type of data. Finally, we started the realization part. We started by testing supervised

classification models for the structured database, namely, SVM, KNN, Random Forest,

Decision Tree, Naïve Bayes and Logistic Regression. Then we performed the model tuning and

after the evaluation we were able to choose the SVM model which gave an accuracy of 0.93.

For image data, we started with the application of the DenseNet-201 architecture, in order to

confirm the damage, detect its location and its level of severity. Next, we applied optical

character recognition to read the license plate. For the textual data, we applied the LDA model

to extract the main topics (represented as a set of words) that occur in the texts.

Keywords: Insurance Fraud, Data Science, Machine Learning, Deep Learning, LDA,

DenseNet