The open source Apache Foundation has accepted an Intel and Cloudera-created project to improve cyber security through big data analytics and machine learning into its incubator program.

Formerly known as the Open Network Insights (ONI), the Apache Spot project is an attempt at using machine learning to filter for and detect bad traffic amongst good data, and to characterise the unique behaviour of network traffic with the help of the Hadoop big data framework.
Intel kicked off the project in February this year on Cloudera's cloud computing platform. Anomoli, Centrify, Cloudwick, Cybraics, eBay, Endgame, Jask, Streamsets and Webroot are some of the companies that have contributed to the project.
Spot stores large amounts information in Apache Hadoop, with data from deep packet inspection of domain name system (DNS) traffic, connections, and log files from proxies for processing in the Apache Spark open source cloud computing clustering framework.
Machine learning is used to build models of networked systems and how they communicate, selected from billions of collected events that are filtered for noise to provide a shortlist of the most likely security threats.
For threat incident and response scenarios, Apache Spot can gather all the characteristics for a given IP address and build a timeline of all conversations that originated with it.
Analysts can use the processed data in Apache Spot to create storyboards of threat events with interactive visualisations.
Common open data models for security information are included to foster analytics collaboration between enterprises for when new threats appear, and to compare them against historical data sets for greater insight.
Here the project is taking a leaf out of the cyber criminals' book: hackers collaborate with each other through internet forums and share information regularly, something that rarely occurs in the security industry.
Apache Spot can be downloaded from the Github open source repository.