Intelligent cyber-security data lake: Construction d'un système de centralisation et de surveillance des logs hétérogènes et détection d'anomalies sur la base de modèles de deep learning
Abstract #
In an IT infrastructure, applications, network devices, operating systems and any programmable
or intelligent device generate thousands of logs daily. Thanks to the analysis of these logs,
malicious attacks, intruders and security vulnerabilities can be detected.
This report summarizes the fruit of our work which aimed to build a cybersecurity Data Lake that
can centralize and manage large amounts of logs with different formats based on the ELK Stack
solution.
We used filebeat and logstash for log collection and preprocessing, elasticsearch for indexing and
log storage, Kibana for descriptive analysis and creation of dashboards, so we generated security
alerts using logstash.
With regard to the detection of anomalies we proposed an «AElog» model which is based on deep
learning techniques, more precisely the trasformers and the CNNs. The results show the reliability
of our model by providing predictions with attenuating performance levels over 99%.
Keywords: log management, log analysis, data lake, ELK Stack, deep learning, NLP, anomaly
detection, BERT, CNN