We will introduce the importance of the business case, introduce autoencoders, perform an exploratory data analysis, and create and then evaluate the model. The model will be presented using Keras with a TensorFlow backend using a Jupyter Notebook and generally applicable to a wide range of anomaly detection problems. Nilson reports that U. Traditionally, many major banks have relied on old rules-based expert systems to catch fraud, but these systems have proved all too easy to beat; the financial services industry is relying on increasing complex fraud detection algorithms.
Many in the financial services industry have updated their fraud detection to include some basic machine learning algorithms including various clustering classifiers, linear approaches, and support vector machines. Long story short, if you want to be where the industry is going and where the jobs are, focus on more advanced fraud detection techniques. This tutorial will focus on one of those more advanced techniques, autoencoders. Autoencoders are a type of neural network that takes an input e.
Although it may sound pointless to feed in input just to get the same thing out, it is in fact very useful for a number of applications. The key here is that the autoencoder boils down encodes the input into some key features that it determines in an unsupervised manner.
Hence the name "autoencoder" — it automatically encodes the input. Let us take this autoencoder of a bicycle as an example. The input is some actual picture of a bicycle that is then reduced to some hidden encoding perhaps representing components such as handlebars and two wheels and then is able to reconstruct the original object from that encoding.
Using Keras and TensorFlow for anomaly detection
Of course there will be some loss "reconstruction error" but hopefully the parts that remain will be the essential pieces of a bicycle. Now let us assume you fed something into this autoencoder that was a unicycle trying to pose as a bicycle. In the process of breaking down the unicycle into components intended for bicycles, the reconstructed version of the unicycle will be really altered i. It is the assumption in using autoencoders that fraud or anomalies will suffer from a detectably high reconstruction error.
First, let's set up the code and import all the necessary packages. The data containsEuropean credit card transactions that occurred over two days with fraudulent transactions. The data looks like we would expect on the surface, but let's double check the shape we are expectingrows and 31 columns. It is a well-groomed dataset so we expect no null values.
Indeed the data seems to be cleaned and loaded as we expect. Now we want to check if we have the expected number of normal and fraudulent rows of data.
Fraud Detection Using Autoencoders in Keras with a TensorFlow Backend
We will simply pull the "Class" column and count the number of normal 0 and fraud 1 rows. The counts are as expectednormal transactions and fraud transactions. As is typical in fraud and anomaly detection in general, this is a very unbalanced dataset.
Let's get a visual confirmation of the unbalanced data in this fraud dataset. As you can see, the normal cases strongly outweigh the fraud cases. We will cut up the dataset into two data frames, one for normal transactions and the other for fraud. Let's look at some summary statistics and see if there are obvious differences between fraud and normal transactions. Although the mean is a little higher in the fraud transactions, it is certainly within a standard deviation and so is unlikely to be easy to discriminate in a highly precise manner between the classes with pure statistical methods.
I could run statistical tests e.By Romeo Kienzler Published March 2, Although the name has changed and some images may show the previous name, the steps and processes in this tutorial will still work. After introducing you to deep learning and long-short term memory LSTM networksI showed you how to generate data for anomaly detection.
Now, in this tutorial, I explain how to create a deep learning neural network for anomaly detection using Keras and TensorFlow. As a reminder, our task is to detect anomalies in vibration accelerometer sensor data in a bearing as shown in Accelerometer sensor on a bearing records vibrations on each of the three geometrical axes x, y, and z.
When talking about deep learning a lot of people often talk about libraries such as TensorFlow and PyTorch. Those are great tools, but in my opinion provide relatively low-level support, meaning you have to think a lot about linear algebra and the shapes of matrices. Keras, on the other hand, is a high-level abstraction layer on top of popular deep learning frameworks such as TensorFlow and Microsoft Cognitive Toolkit—previously known as CNTK; Keras not only uses those frameworks as execution engines to do the math, but it is also can export the deep learning models so that other frameworks can pick them up.
You can do the fast prototyping in Keras and then scale out on Apache Spark using Deeplearning4j or SystemML as an execution framework for your Keras models. Finally, for completeness, there exists frameworks like TensorFrames and TensorSpark to directly bring TensorFlow to Apache Spark, but this is beyond this article.
Before we talk about the deep learning use case, spend some time setting up your development environment. We use a Jupyter Notebook running inside Watson Studio. From there you can click New notebook.
In theory, you could already start from here, but let me introduce Keras and TensorFlow a bit and then walk you through the code. TensorFlow has two components: an engine executing linear algebra operations on a computation graphand some sort of interface to define and execute the graph. Although different language bindings for TensorFlow exist, the most prominent one is Python. There are three components of the engine:.
Distributed TensorFlow can run on multiple machines, but this is not covered in this article because we can use Deeplearning4j and Apache SystemML for distributed processing on Apache Spark without the need to install distributed TensorFlow.TensorFlow Tutorial #23 Time-Series Prediction
In other words, TensorFlow is nothing more than a domain specific language DSL expressed in Python to define a computational execution graph for linear algebra operations and a corresponding parallel execution engine for running an optimized version of it. Therefore, I suggest using Keras wherever possible.
This blog post titled Keras as a simplified interface to TensorFlow: tutorial is a nice introduction to Keras. I will explain Keras based on this blog post during my walk-through of the code in this tutorial. We need to build something useful in Keras using TensorFlow on Watson Studio with a generated data set. Remember, we used a Lorenz Attractor model to get simulated real-time vibration sensor data in a bearing. We need to get that data to the IBM Cloud platform.
See the tutorial on how to generate data for anomaly detection.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. If nothing happens, download GitHub Desktop and try again.
If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. This repository contains the code used in my master thesis on LSTM based anomaly detection for time series data. The thesis report can be downloaded from here. Due to the challenges in obtaining labeled anomaly datasets, an unsupervised approach is employed.
The resulting prediction errors are modeled to give anomaly scores. We investigate different ways of maintaining LSTM state, and the effect of using a fixed number of time steps on LSTM prediction and detection performance. LSTMs are also compared to feed-forward neural networks with fixed size time windows over inputs.
Our experiments, with three real-world datasets, show that while LSTM RNNs are suitable for general purpose time series modeling and anomaly detection, maintaining LSTM state is crucial for getting desired results. Moreover, LSTMs may not be required at all for simple time series. This file has different configuration settings. For training the model and generating predictions two main files are provided:. For anomaly detection we need to calculate prediction errors or residuals, model them using Gaussian distribution and then set thresholds.
This is done in "Part 3" of the corresponding notebook files. Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign up. Jupyter Notebook Python. Jupyter Notebook Branch: master. Find file. Sign in Sign up. Go back. Launching Xcode If nothing happens, download Xcode and try again. Latest commit. Latest commit e7f Nov 27, Requirements Keras 2.
Set it to false when running on remote machines with no display. Else the plotting comamnds will result in an error.The goal of this post is to walk you through the steps to create and train an AI deep learning neural network for anomaly detection using Python, Keras and TensorFlow. I will not delve too much in to the underlying theory and assume the reader has some basic knowledge of the underlying technologies.
However, I will provide links to more detailed information as we go and you can find the source code for this study in my GitHub repo. In the NASA study, sensor readings were taken on four bearings that were run to failure under constant load over multiple days. Our dataset consists of individual files that are 1-second vibration signal snapshots recorded at 10 minute intervals. Each file contains 20, sensor data points per bearing that were obtained by reading the bearing sensors at a sampling rate of 20 kHz.
You can download the sensor data here. You will need to unzip them and combine them into a single data directory. We will use an autoencoder deep learning neural network model to identify vibrational anomalies from the sensor readings. The goal is to predict future bearing failures before they happen. The concept for this study was taken in part from an excellent article by Dr. In that article, the author used dense neural network cells in the autoencoder model.
A key attribute of recurrent neural networks is their ability to persist information, or cell state, for use later in the network.
This makes them particularly well suited for analysis of temporal data that evolves over time. LSTM networks are used in tasks such as speech recognition, text translation and here, in the analysis of sequential sensor readings for anomaly detection. There are numerous excellent articles by individuals far better qualified than I to discuss the fine details of LSTM networks.
I will be using an Anaconda distribution Python 3 Jupyter notebook for creating and training our neural network model.
We will use TensorFlow as our backend and Keras as our core model development library. The first task is to load our Python libraries. We then set our random seed in order to create reproducible results.
The assumption is that the mechanical degradation in the bearings occurs gradually over time; therefore, we will use one datapoint every 10 minutes in our analysis. Each 10 minute data file sensor reading is aggregated by using the mean absolute value of the vibration recordings over the 20, datapoints.
Anomaly detection with Keras, TensorFlow, and Deep Learning
We then merge everything together into a single Pandas dataframe. Next, we define the datasets for training and testing our neural network. To do this, we perform a simple split where we train on the first part of the dataset, which represents normal operating conditions.
We then test on the remaining part of the dataset that contains the sensor readings leading up to the bearing failure. First, we plot the training set sensor readings which represent normal operating conditions for the bearings. Next, we take a look at the test dataset sensor readings over time.In this tutorial, you will learn how to perform anomaly and outlier detection using autoencoders, Keras, and TensorFlow.
Back in January, I showed you how to use standard machine learning models to perform anomaly detection and outlier detection in image datasets. To answer such a question would require us to dive further down the rabbit hole and answer questions such as:.
To learn how to perform anomaly detection with Keras, TensorFlow, and Deep Learning, just keep reading!
To quote my intro to anomaly detection tutorial :. Depending on your exact use case and application, anomalies only typically occur 0. The problem is only compounded by the fact that there is a massive imbalance in our class labels.
By definition, anomalies will rarely occur, so the majority of our data points will be of valid events. To detect anomalies, machine learning researchers have created algorithms such as Isolation Forests, One-class SVMs, Elliptic Envelopes, and Local Outlier Factor to help detect such events; however, all of these methods are rooted in traditional machine learning.
As I discussed in my intro to autoencoder tutorialautoencoders are a type of unsupervised neural network that can:. To accomplish this task, an autoencoder uses two components: an encoder and a decoder. The encoder accepts the input data and compresses it into the latent-space representation. The decoder then attempts to reconstruct the input data from the latent space. When trained in an end-to-end fashion, the hidden layers of the network learn filters that are robust and even capable of denoising the input data.
However, what makes autoencoders so special from an anomaly detection perspective is the reconstruction loss. When we train an autoencoder, we typically measure the mean-squared-error MSE between:. Since the autoencoder has never seen an elephant beforeand more to the point, was never trained to reconstruct an elephant, our MSE will be very high.
Alon Agmon does a great job explaining this concept in more detail in this article. To configure your system and install TensorFlow 2. Our convautoencoder. Open up convautoencoder. Imports include tf. Our ConvAutoencoder class contains one static method, buildwhich accepts five parameters:. We then flatten the network and construct our latent vector. The latent-space representation is the compressed form of our data.
In the above code block we used the encoder portion of our autoencoder to construct our latent-space representation — this same representation will now be used to reconstruct the original input image:. Here, we are take the latent input and use a fully-connected layer to reshape it into a 3D volume i. Finally, we build the decoder model and construct the autoencoder.
Recall that an autoencoder consists of both the encoder and decoder components. We then return a 3-tuple of the encoder, decoder, and autoencoder. Again, if you need further details on the implementation of our autoencoder, be sure to review the aforementioned tutorials. Imports include our implementation of ConvAutoencoderthe mnist dataset, and a few imports from TensorFlow, scikit-learn, and OpenCV. The function accepts a set of input data and labelsincluding valid label and anomaly label.
The contam percentage is used to help us sample and select anomaly datapoints. From our set of labels and using the valid labelwe generate a list of validIdxs Line The exact same process is applied to grab anomalyIdxs Line Suppose, you are a credit card holder and on an unfortunate day it got stolen. Payment Processor Companies like PayPal do keep a track of your usage pattern so as to notify in case of any dramatic change in the usage pattern.
The patterns include transaction amounts, the location of transactions and so on. If a credit card is stolen, it is very likely that the transactions may vary largely from the usual ones. This is where among many other instances the companies use the concepts of anomalies to detect the unusual transactions that may take place after the credit card theft.
Noise and anomalies are not the same. So, how noise looks like in the real world? People tend to buy a lot of groceries at the start of a month and as the month progresses the grocery shop owner starts to see a vivid decrease in the sales. Then he starts to give discounts on a number of grocery items and also does not fail to advertise about the scheme. This discount scheme might cause an uneven increase in sales but are they normal?
They, sure, are not. These are noises more specifically stochastic noises. By now, we have a good idea of how anomalies look like in a real-world setting. Allow me to quote the following from classic book Data Mining. Concepts and Techniques by Han et al. Could not get any better, right? To be able to make more sense of anomalies, it is important to understand what makes an anomaly different from noise. The way data is generated has a huge role to play in this. For the normal instances of a dataset, it is more likely that they were generated from the same process but in case of the outliers, it is often the case that they were generated from a different process s.
In the above figure, I show you what it is like to be outliers within a set of closely related data-points. The closeness is governed by the process that generated the data points. From this, it can be inferred that the process for generated those two encircled data-points must have been different from that one that generated the other ones.
But how do we justify that those red data points were generated by some other process? While doing anomaly analysis, it is a common practice to make several assumptions on the normal instances of the data and then distinguish the ones that violate these assumptions.
I am implementing an anomaly detection system that will be used on different time series one observation every 15 min for a total of 5 months. All these time series have a common pattern: high levels during working hours and low levels otherwise. The idea presented in many papers is the following: build a model to predict future values and calculate an anomaly score based on the residuals.
I use an LSTM to predict the next time step given the previous 96 1 day of observations and then I calculate the anomaly score as the likelihood that the residuals come from one of the two normal distributions fitted on the residuals obtained with the validation test. I am using two different distributions, one for working hours and one for non working hours.
The model detects very well point anomalies, such as sudden falls and peaks, but it fails during holidays, for example. If an holiday is during the week, I expect my model to detect more anomalies, because it's an unusual daily pattern wrt a normal working day.
But the predictions simply follows the previous observations. Use a second and more lightweight model based on time series decomposition which is fed with daily aggregations instead of 15min aggregations to detect daily anomalies. This combination of two models allows me to have both anomalies and it works very well, but my idea was to use only one model because I expected the LSTM to be able to "learn" also the weekly pattern.
Instead it strictly follows the previous time steps without taking into consideration that it is a working hour and the level should be much higher.
I tried to add exogenous variables to the input hour of day, day of weekto add layers and number of cells, but the situation is not that better. Training with MSE is equivalent to optimizing the likelihood of your data under a Gaussian with fixed variance and mean given by your model. So you are already training an autoencoder, though you do not formulate it so. Since you provide data from last 24 hours only, the LSTM cannot possibly learn a weekly pattern. It could at best learn that the value should be similar as it was 24 hours before though it is very unlikely, see next point -- and then you break it with Fri-Sat and Sun-Mon data.
From the LSTM's point of view, your holiday 'anomaly' looks pretty much the same as the weekend data you were providing during the training. So you would first need to provide longer contexts during learning I assume that you carry the hidden state on during test time. Assuming that your data really follows a simple pattern -- high value during and only during working hours, plus some variations of smaller scale -- the LSTM doesn't need any long-term knowledge for most of the datapoints.
Putting in all my human imagination, I can only envision the LSTM benefiting from long-term dependencies at the beginning of the working hours, so just for one or two samples out of the Thus it may help to weight the samples at the beginning of working hours more, so that the respective loss can actually influence representations from far history.
It is difficult to model sequences at multiple scales in a single model. Such multiscale modeling is especially difficult for an RNN, because it needs to process all the information, always, with the same weights.
Introduction to Anomaly Detection in Python
If you really want on model to learn it all, you may have more success with deep feedforward architectures employing some sort of time-convolution, eg.
As these have skip connections over longer temporal context and apply different transformations at different levels, they have better chances of discovering and exploiting such an unexpected long-term dependency. I did not play with them I've actually moved away from Keras some time agobut esp. This one is a bit philosophical. Your current approach shows that you have a very strong belief that there are two different setups: workhours and the rest.
You're even OK with changing part of your model the Gaussian according to it. So perhaps your data actually comes from two distributions and you should therefore train two models and switch between them as appropriate?