**Detection of Outliers in Time Series Data from Control Systems**

Outliers are observations which do not fit in the tendency of the time series observed as they differ dramatically from the typical pattern of the trend and/or seasonal components.

Time series data often undergo sudden changes that alter the dynamics of the data. These changes are typically non-systematic and cannot be captured by standard time series models. That’s why they are known as outlier effects. Detecting outliers is important because they have an impact on the selection of the model, the estimation of parameters and consequently, on forecasts. Hence, an approach was followed as described in Chen & Liu (1993) which was published in the Journal of the American Statistical Association, an automatic procedure for detection of outliers in time series i.e implemented in the package tsoutliers. The function tso is the main interface for the automatic procedure [1].

I wrote this blog when i was at Eka Analytics.

** Outlier Types**

The package detects 5 different types of outliers iteratively in time series data [5]:

**Additive Outlier (AO)**

An additive outlier appears as a surprisingly large or small value occurring for a single observation. Subsequent observations are unaffected by an additive outlier. Consecutive additive outliers are typically referred to additive outlier patches.

**Innovation Outlier (IO)**

An innovational outlier is characterized by an initial impact with effects lingering over subsequent observations. The influence of the outliers may increase as time proceeds.

**Level Shift (LS)**

For a level shift, all observations appearing after the outlier move to a new level. In contrast to additive outliers, a level shift outlier affects many observations and has a permanent effect.

**Temporary change (TC)**

Transient change outliers are similar to level shift outliers, but the effect of the outlier diminishes exponentially over the subsequent observations. Eventually, the series returns to its normal level.

**Seasonal Level Shift (SLS)**

A seasonal additive outlier appears as a surprisingly large or small value occurring repeatedly at regular intervals.

The following figure displays several types of outliers commonly occurring in time series. The blue lines represent a series without outliers. The red lines suggest a pattern that might be present if the series contained outliers.

The column containing the observation at which the outlier sparks is named “ind”; the column containing the weights is named “coefhat” and the type of outlier is specified in a column named “type”. The column “ind” should contain the index time point, not the time point in terms of year and season as given by time. The function locate.outliers computes all the *t*-statistics for each type of outlier and for every time point. Then, the cases where the corresponding *t*-statistic are (in absolute value) below the threshold cval are removed. Thus, a potential set of outliers is obtained [6].

**Outlier Detection in Reclaimer Performance**

In this Section, lets discuss the problem of outlier detection of reclaim performance in mining data and data is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result

**R command: **

*# read timeseries data*

*AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-MiningData.csv”, header=TRUE)*

*dat.ts<- ts(AgsAnomalydata$Reclaim_performance)*

* *

*# detect outliers*

*data.ts.outliers <- tso(dat.ts)*

* *

*# plot*

*plot(data.ts.outliers)*

* *

**Result:**

** **

Outliers:

Sl.No |
type |
ind |
time |
coefhat |
tstat |

1 | LS | 7 | 7 | 67.97 | 24.456 |

2 | AO | 147 | 147 | 201.53 | 6.817 |

3 | AO | 214 | 214 | 114.5 | 3.888 |

4 | AO | 238 | 238 | 323.1 | 11.07 |

5 | AO | 369 | 369 | 365.54 | 12.532 |

6 | AO | 388 | 388 | 283.06 | 9.665 |

** Outlier Detection in Realized PnL (Profit and Loss)**

In this Section, lets discuss the problem of outlier detection in realized PnL time series data of large agriculture commodity trader considering a year data (approximately 150 records), this data is fed to tsoutliers algorithm in an open source software R.

Below mentioned R command installs *tsoutliers* package and loads its libraries:

*#install tsoutliers package*

*install.packages(“tsoutliers”)*

* *

*#load tsoutliers library*

*library(“tsoutliers”)*

* *

Below are the R commands used to detect outlier and its result.

**R command: **

*# read timeseries data*

*pnldata <- read.csv(file=”D:/R-AnamolyDetection/RPL_MonthData.csv”, header=TRUE)*

*dat.ts<- ts(pnldata$REALIZED_PNL)*

* *

*# detect outliers*

*data.ts.outliers <- tso(dat.ts)*

* *

*#plot *

*plot(data.ts.outliers)*

**Result:**

Outliers:

Sl.No |
type |
ind |
time |
coefhat |
tstat |

1 | AO | 1 | 1 | -53786741 | -24.521 |

2 | LS | 3 | 3 | -62319481 | -30.212 |

3 | TC | 4 | 4 | 6363101 | 3.537 |

4 | AO | 6 | 6 | -6178624 | -3.438 |

5 | LS | 7 | 7 | 6968697 | 5.169 |

6 | AO | 19 | 19 | -14381299 | -9.172 |

7 | AO | 52 | 52 | -7551356 | -4.816 |

8 | AO | 61 | 61 | -8560154 | -5.459 |

9 | AO | 78 | 78 | 7904339 | 5.041 |

10 | LS | 96 | 96 | -26229557 | -23.705 |

11 | AO | 97 | 97 | 8239853 | 4.338 |

12 | AO | 98 | 98 | 8874709 | 4.672 |

13 | TC | 99 | 99 | 10393736 | 5.368 |

14 | AO | 104 | 104 | -7343157 | -4.148 |

15 | LS | 105 | 105 | 9045628 | 8.097 |

** Outlier Detection in Procurement Data **

In this Section, lets discuss the problem of outlier detection in procurement time series data and it is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result:

**R command: **

*# read timeseries data*

*AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-Data.csv”, header=TRUE)*

*dat.ts<- ts(AgsAnomalydata$Variance)*

* *

*# detect outliers*

*data.ts.outliers <- tso(dat.ts)*

* *

*# plot*

*plot(data.ts.outliers)*

**Result: **

Outliers:

Sl.No |
type |
ind |
time |
coefhat |
tstat |

1 | AO | 4 | 4 | 57.19 | 10.776 |

2 | AO | 8 | 8 | 79.25 | 14.952 |

3 | LS | 13 | 13 | 124.05 | 14.317 |

4 | AO | 29 | 29 | -94.52 | -17.836 |

5 | TC | 32 | 32 | 33.32 | 4.604 |

6 | AO | 33 | 33 | 44.16 | 8.279 |

**Outlier Detection in Grain Storage Site Receival and Outturn Data**

In this Section, lets discuss the problem of outlier detection of receival quantity at storage site level for four years of data and it is fed to tsoutliers algorithm in an open source software R.

Below are the R commands used to detect outlier and its result

**R command: **

*# read timeseries data*

*AgsAnomalydata <- read.csv(file=”D:/R-AnamolyDetection/TS-Forecast.csv”, header=TRUE)*

*dat.ts<- ts(AgsAnomalydata$RECEIVAL_QTY)*

* *

*# detect outliers*

*data.ts.outliers <- tso(dat.ts)*

* *

*# plot*

*plot(data.ts.outliers)*

**Result: **

Outliers:

Sl.No |
type |
ind |
time |
coefhat |
tstat |

1 | LS | 1 | 1 | 59803 | 13.55 |

2 | TC | 11 | 11 | 271142 | 11.635 |

3 | AO | 12 | 12 | 292987 | 10.478 |

4 | TC | 13 | 13 | -69690 | -3.406 |

5 | TC | 22 | 22 | 193536 | 9.531 |

6 | AO | 23 | 23 | 507816 | 19.207 |

7 | TC | 34 | 34 | 205454 | 10.088 |

8 | AO | 35 | 35 | 472585 | 17.875 |

9 | TC | 46 | 46 | 246877 | 11.648 |

10 | AO | 47 | 47 | 338800 | 12.532 |

** Detection of outlier using twitter algorithm**

In this Section, lets discuss the problem of outlier detection of reclaim performance in mining data and data is fed to twitter algorithm in an open source software R.

The following figure displays small blue circles that represents the anomalies and max_anoms is 0.0001 in the below figure.

Below mentioned R command installs *devtools* package and loads its libraries:

*#install devtools package*

*install.packages(“devtools”)*

* *

*#load **AnomalyDetection library*

*library(“**AnomalyDetection “)*

Below are the R commands used to detect outlier and its result.

**R command:**

* # prepare data frame need to be fed into anomaly detection package*

performanceDF <- data.frame(anomalyData$Reclaim_date,anomalyData$Reclaim_performance)

* *

* *

* *

*# detect outliers*

*performanceAnomalyDetectionResult <- AnomalyDetectionTs(performanceDF, max_anoms=0.0001, direction=’both’, plot=TRUE, xlabel = “Reclaim Date”, ylabel = “Performance”, title = “Anamoly Performance Observation”)*

* *

*# plot*

*performanceAnomalyDetectionResult$plot*

* *

**Result:**

** **

** **

**Key Points of Twitter & TsOutliers Algorithm used for detection of outliers:**

**TsOutliers:**

- It will handle seasonal and non-seasonal time series.
- It characterize the bulk of the data in a way that is not influenced by any outliers and then point to any individual values that do not fit within that characterization.
- Detection of outliers in time series following the Chen and Liu (1993) procedure.
- tsOutliers package name is tsoutliers.

**Twitter’s algorithm:**

- It identifies big outliers.
- It detects one or more statistically significant anomalies in the input time series.
- Twitter identified that social network data is inherently seasonal and has strong trend components, hence twitter algo provides good results mainly for these set of inputs.
- The algorithm implemented is called S-H-ESD, or Seasonal Hybrid Extreme Studentized Deviates.
- Twitter Anomaly package name is devtools.

**Reference: **

- https://cran.r-project.org/web/packages/tsoutliers/tsoutliers.pdf
- http://stats.stackexchange.com/questions/104882/detecting-outliers-in-time-series-ls-ao-tc- using-tsoutliers-package-in-r-how
- http://epublications.marquette.edu/cgi/viewcontent.cgi?article=1047&context=theses_open
- http://probdist.com/twitters-anomaly-detection-package/
__http://www-01.ibm.com/support/knowledgecenter/SS3RA7_15.0.0/com.ibm.spss.modeler.help/ts_outliers_overview.htm____http://finzi.psych.upenn.edu/library/tsoutliers/html/outliers-effects.html__