Detection of Outliers in Time Series Data from Control Systems

Outliers are observations which do not fit in the tendency of the time series observed as they differ dramatically from the typical pattern of the trend and/or seasonal components.

Time series data often undergo sudden changes that alter the dynamics of the data. These changes are typically non-systematic and cannot be captured by standard time series models. That’s why they are known as outlier effects. Detecting outliers is important because they have an impact on the selection of the model, the estimation of parameters and consequently, on forecasts. Hence, an approach was followed as described in Chen & Liu (1993) which was published in the Journal of the American Statistical Association, an automatic procedure for detection of outliers in time series i.e  implemented in the package tsoutliers. The function tso is the main interface for the automatic procedure [1].

I wrote this blog when i was at Eka Analytics.

                                               Outlier Types

The package detects 5 different types of outliers iteratively in time series data [5]:

  • Additive Outlier (AO)

An additive outlier appears as a surprisingly large or small value occurring for a single observation. Subsequent observations are unaffected by an additive outlier. Consecutive additive outliers are typically referred to additive outlier patches.

  • Innovation Outlier (IO)

An innovational outlier is characterized by an initial impact with effects lingering over subsequent observations. The influence of the outliers may increase as time proceeds.

  • Level Shift (LS)

For a level shift, all observations appearing after the outlier move to a new level. In contrast to additive outliers, a level shift outlier affects many observations and has a permanent effect.

  • Temporary change (TC)

Transient change outliers are similar to level shift outliers, but the effect of the outlier diminishes exponentially over the subsequent observations. Eventually, the series returns to its normal level.

  • Seasonal Level Shift (SLS)

A seasonal additive outlier appears as a surprisingly large or small value occurring repeatedly at regular intervals.

The following figure displays several types of outliers commonly occurring in time series. The blue lines represent a series without outliers. The red lines suggest a pattern that might be present if the series contained outliers.

 

The column containing the observation at which the outlier sparks is named “ind”; the column containing the weights is named “coefhat” and the type of outlier is specified in a column named “type”. The column “ind” should contain the index time point, not the time point in terms of year and season as given by time. The function locate.outliers computes all the t-statistics for each type of outlier and for every time point. Then, the cases where the corresponding t-statistic are (in absolute value) below the threshold cval are removed. Thus, a potential set of outliers is obtained [6].

Outlier Detection in Reclaimer Performance

In this Section, lets discuss the problem of outlier detection of reclaim performance in mining data and data is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result

R command:

# read timeseries data

AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-MiningData.csv”, header=TRUE)

dat.ts<- ts(AgsAnomalydata$Reclaim_performance)

 

# detect outliers

data.ts.outliers <- tso(dat.ts)

 

# plot

plot(data.ts.outliers)

 

Result:

 

Outliers:

Sl.No type ind time   coefhat   tstat
1 LS 7 7 67.97 24.456
2 AO 147 147 201.53 6.817
3 AO 214 214 114.5 3.888
4 AO 238 238 323.1 11.07
5 AO 369 369 365.54 12.532
6 AO 388 388 283.06 9.665

 

                     Outlier Detection in Realized PnL (Profit and Loss)

In this Section, lets discuss the problem of outlier detection in realized PnL time series data of large agriculture commodity trader considering a year data (approximately 150 records), this data is fed to tsoutliers algorithm in an open source software R.

Below mentioned R command installs tsoutliers package and loads its libraries:

#install tsoutliers package

install.packages(“tsoutliers”)

 

#load tsoutliers library

library(“tsoutliers”)

 

Below are the R commands used to detect outlier and its result.

R command:

# read timeseries data

pnldata <- read.csv(file=”D:/R-AnamolyDetection/RPL_MonthData.csv”, header=TRUE)

dat.ts<- ts(pnldata$REALIZED_PNL)

 

# detect outliers

data.ts.outliers <- tso(dat.ts)

 

#plot

plot(data.ts.outliers)

Result:

Outliers:

Sl.No type ind time   coefhat   tstat
1 AO 1 1 -53786741 -24.521
2 LS 3 3 -62319481 -30.212
3 TC 4 4 6363101 3.537
4 AO 6 6 -6178624 -3.438
5 LS 7 7 6968697 5.169
6 AO 19 19 -14381299 -9.172
7 AO 52 52 -7551356 -4.816
8 AO 61 61 -8560154 -5.459
9 AO 78 78 7904339 5.041
10 LS 96 96 -26229557 -23.705
11 AO 97 97 8239853 4.338
12 AO 98 98 8874709 4.672
13 TC 99 99 10393736 5.368
14 AO 104 104 -7343157 -4.148
15 LS 105 105 9045628 8.097

  Outlier Detection in Procurement Data

In this Section, lets discuss the problem of outlier detection in procurement time series data and it is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result:

R command:

# read timeseries data

AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-Data.csv”, header=TRUE)

dat.ts<- ts(AgsAnomalydata$Variance)

 

# detect outliers

data.ts.outliers <- tso(dat.ts)

 

# plot

plot(data.ts.outliers)

Result:

Outliers:

Sl.No type ind time   coefhat   tstat
1 AO 4 4 57.19 10.776
2 AO 8 8 79.25 14.952
3 LS 13 13 124.05 14.317
4 AO 29 29 -94.52 -17.836
5 TC 32 32 33.32 4.604
6 AO 33 33 44.16 8.279

Outlier Detection in Grain Storage Site Receival and Outturn Data

In this Section, lets discuss the problem of outlier detection of receival quantity at storage site level for four years of data and it is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result

R command:

# read timeseries data

AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-Forecast.csv”, header=TRUE)

dat.ts<- ts(AgsAnomalydata$RECEIVAL_QTY)

 

# detect outliers

data.ts.outliers <- tso(dat.ts)

 

# plot

plot(data.ts.outliers)

Result:

Outliers:

Sl.No type ind time   coefhat   tstat
1 LS 1 1 59803 13.55
2 TC 11 11 271142 11.635
3 AO 12 12 292987 10.478
4 TC 13 13 -69690 -3.406
5 TC 22 22 193536 9.531
6 AO 23 23 507816 19.207
7 TC 34 34 205454 10.088
8 AO 35 35 472585 17.875
9 TC 46 46 246877 11.648
10 AO 47 47 338800 12.532

 

 

                           Detection of outlier using twitter algorithm

In this Section, lets discuss the problem of outlier detection of reclaim performance in mining data and data is fed to twitter algorithm in an open source software R.

The following figure displays small blue circles that represents the anomalies and max_anoms is 0.0001 in the below figure.

Below mentioned R command installs devtools package and loads its libraries:

#install devtools package

install.packages(“devtools”)

 

#load AnomalyDetection library

library(“AnomalyDetection “)

Below are the R commands used to detect outlier and its result.

R command:

 # prepare data frame need to be fed into anomaly detection package

performanceDF <-                                                                                                                                                                                                                                                                                                                                                                                 data.frame(anomalyData$Reclaim_date,anomalyData$Reclaim_performance)

 

 

 

# detect outliers

performanceAnomalyDetectionResult <- AnomalyDetectionTs(performanceDF, max_anoms=0.0001, direction=’both’, plot=TRUE, xlabel = “Reclaim Date”, ylabel = “Performance”, title = “Anamoly Performance Observation”)

 

# plot

performanceAnomalyDetectionResult$plot

 

Result:

 

 

Key Points of Twitter & TsOutliers Algorithm used for detection of outliers:

TsOutliers:

  • It will handle seasonal and non-seasonal time series.
  • It characterize the bulk of the data in a way that is not influenced by any outliers and then point to any individual values that do not fit within that characterization.
  • Detection of outliers in time series following the Chen and Liu (1993) procedure.
  • tsOutliers package name is tsoutliers.

 

Twitter’s algorithm:

  • It identifies big outliers.
  • It detects one or more statistically significant anomalies in the input time series.
  • Twitter identified that social network data is inherently seasonal and has strong trend components, hence twitter algo provides good results mainly for these set of inputs.
  • The algorithm implemented is called S-H-ESD, or Seasonal Hybrid Extreme Studentized Deviates.
  • Twitter Anomaly package name is devtools.

 

Reference:

  1. https://cran.r-project.org/web/packages/tsoutliers/tsoutliers.pdf
  2. http://stats.stackexchange.com/questions/104882/detecting-outliers-in-time-series-ls-ao-tc- using-tsoutliers-package-in-r-how
  3. http://epublications.marquette.edu/cgi/viewcontent.cgi?article=1047&context=theses_open
  4. http://probdist.com/twitters-anomaly-detection-package/
  5. http://www-01.ibm.com/support/knowledgecenter/SS3RA7_15.0.0/com.ibm.spss.modeler.help/ts_outliers_overview.htm
  6. http://finzi.psych.upenn.edu/library/tsoutliers/html/outliers-effects.html

Leave a Comment

Your email address will not be published. Required fields are marked *