Détection de Fraude en Assurance Automobile
Abstract #
Fraud has become a major problem in the insurance industry. Many resources are invested
to identify and prevent it. Traditional approaches to identifying fraud are rule-based, which
means that strict and rapid rules for reporting fraudulent claims must be established manually
and in advance. But these techniques are no longer effective because of the increase of data
quantity to be processed and the variety of its types. For this, the techniques of Data Science
present the obvious solution for this problem.
In this regard, a client of the host company “DEVOTEAM”, which operates in the insurance
field, decided to invest in Data Science, in order to create a solution for automobile insurance
fraud detection.
This report summarizes the work aimed at the realization of a fraud detection solution in
automobile insurance. This solution uses three types of data, namely structured data, images of
damaged cars and accident description texts.
Before the realization of the project, we started with a documentation relating to the various
techniques of resolution of the problem, namely the algorithms of supervised and unsupervised
learning and the algorithms of deep learning. Then, we started to design the different models
for each type of data. Finally, we started the realization part. We started by testing supervised
classification models for the structured database, namely, SVM, KNN, Random Forest,
Decision Tree, Naïve Bayes and Logistic Regression. Then we performed the model tuning and
after the evaluation we were able to choose the SVM model which gave an accuracy of 0.93.
For image data, we started with the application of the DenseNet-201 architecture, in order to
confirm the damage, detect its location and its level of severity. Next, we applied optical
character recognition to read the license plate. For the textual data, we applied the LDA model
to extract the main topics (represented as a set of words) that occur in the texts.
Keywords: Insurance Fraud, Data Science, Machine Learning, Deep Learning, LDA,
DenseNet