• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

demo/src/H03-May-2022-8,8278,770

src/luminol/H19-Jan-2016-2,5672,055

.gitignoreH A D19-Jan-2016450 3935

LICENSEH A D19-Jan-201611.1 KiB203169

MANIFEST.inH A D19-Jan-201640 21

NOTICEH A D19-Jan-2016905 3022

README.mdH A D19-Jan-201610 KiB205170

VERSIONH A D19-Jan-20164 21

setup.pyH A D19-Jan-2016773 2317

README.md

1# luminol #
2
3### Overview
4Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detection and correlation. It can be used to investigate possible causes of anomaly. You collect time series data and Luminol can:
5* Given a time series, detect if the data contains any anomaly and gives you back a time window where the anomaly happened in, a time stamp where the anomaly reaches its severity, and a score indicating how severe is the anomaly compare to others in the time series.
6* Given two time series, help find their correlation coefficient. Since the correlation mechanism allows a shift room, you are able to correlate two peaks that are slightly apart in time.
7
8Luminol is configurable in a sense that you can choose which specific algorithm you want to use for anomaly detection or correlation. In addition, the library does not rely on any predefined threshold on the values of a time series. Instead, it assigns each data point an anomaly score and identifies anomalies using the scores.
9
10By using the library, we can establish a logic flow for root cause analysis. For example, suppose there is a spike in network latency:
11* Anomaly detection discovers the spike in network latency time series
12* Get the anomaly period of the spike, and correlate with other system metrics(GC, IO, CPU, etc.) in the same time range
13* Get a ranked list of correlated metrics, and the root cause candidates are likely to be on the top.
14
15Investigating the possible ways to automate root cause analysis is one of the main reasons we developed this library and it will be a fundamental part of the future work.
16
17***
18
19### Installation
20make sure you have python, pip, numpy, and install directly through pip:
21```
22pip install luminol
23```
24the most up-to-date version of the library is 0.1.
25
26***
27
28### Quick Start
29This is a quick start guide for using luminol for time series analysis.
30
311. import the library
32   ```python
33   import luminol
34   ```
35
362. conduct anomaly detection on a single time series ts.
37   ```python
38   detector = luminol.anomaly_detector.AnomalyDetector(ts)
39   anomalies = detector.get_anomalies()
40   ```
41
423. if there is anomaly, correlate the first anomaly period with a secondary time series ts2.
43   ```python
44   if anomalies:
45     time_period = anomalies[0].get_time_window()
46     correlator = luminol.correlator.Correlator(ts, ts2, time_period)
47   ```
48
494. print the correlation coefficient
50   ```python
51   print correlator.get_correlation_result().coefficient
52   ```
53
54These are really simple use of luminol. For information of the parameter types, return types and optional parameters, please refer to the API.
55
56***
57
58### Modules
59Modules in Luminol refers to customized classes developed for better data representation, which are `Anomaly`, `CorrelationResult` and `TimeSeries`.
60####Anomaly
61_class_ luminol.modules.anomaly.**Anomaly**
62<br/> It contains these attributes:
63```python
64self.start_timestamp: # epoch seconds represents the start of the anomaly period.
65self.end_timestamp: # epoch seconds represents the end of the anomaly period.
66self.anomaly_score: # a score indicating how severe is this anomaly.
67self.exact_timestamp: # epoch seconds indicates when the anomaly reaches its severity.
68```
69It has these public methods:
70* `get_time_window()`: returns a tuple (start_timestamp, end_timestamp).
71
72####CorrelationResult
73_class_ luminol.modules.correlation_result.**CorrelationResult**
74<br/> It contains these attributes:
75```python
76self.coefficient: # correlation coefficient.
77self.shift: # the amount of shift needed to get the above coefficient.
78self.shifted_coefficient: # a correlation coefficient with shift taken into account.
79```
80
81####TimeSeries
82_class_ luminol.modules.time_series.**TimeSeries**
83```python
84__init__(self, series)
85```
86* `series(dict)`: timestamp -> value
87
88It has a various handy methods for manipulating time series, including generator `iterkeys`, `itervalues`, and `iteritems`. It also supports binary operations such as add and subtract. Please refer to the [code](https://github.com/linkedin/naarad/blob/master/lib/luminol/src/luminol/modules/time_series.py) and inline comments for more information.
89
90***
91
92### API
93The library contains two classes: `AnomalyDetector` and `Correlator`, and there are two sets of APIs, one corresponding to each class. There are also customized modules for better data representation. The [Modules](#modules) section in this documentation may provide useful information as you walk through the APIs.
94####AnomalyDetector
95_class_ luminol.anomaly_detector.**AnomalyDetecor**
96```python
97__init__(self, time_series, baseline_time_series=None, score_only=False, score_threshold=None,
98score_percentile_threshold=None, algorithm_name=None, algorithm_params=None,
99refine_algorithm_name=None, refine_algorithm_params=None)
100```
101*  `time_series`: The metric you want to conduct anomaly detection on. It can have the following three types:
102
103   ```python
104   1. string: # path to a csv file
105   2. dict: # timestamp -> value
106   3. lumnol.modules.time_series.TimeSeries
107   ```
108* `baseline_time_series`: an optional baseline time series of one the types mentioned above.
109* `score only(bool)`: if asserted, anomaly scores for the time series will be available, while anomaly periods will not be identified.
110* `score_threshold`: if passed, anomaly scores above this value will be identified as anomaly. It can overrides score_percentile_threshold.
111* `score_precentile_threshold`: if passed, anomaly scores above this percentile will be identified as anomaly. It can not overrides score_threshold.
112* `algorithm_name(string)`: if passed, the specific algorithm will be used to compute anomaly scores.
113* `algorithm_params(dict)`: additional parameters for algorithm specified by algorithm_name.
114* `refine_algorithm_name(string)`: if passed, the specific algorithm will be used to compute the time stamp of severity within each anomaly period.
115* `refine_algorithm_params(dict)`: additional parameters for algorithm specified by refine_algorithm_params.
116
117Available algorithms and their additional parameters are:
118
119  ```python
120  1. 'bitmap_detector': # behaves well for huge data sets, and it is the default detector.
121     {
122        'precision'(4): # how many sections to categorize values,
123        'lag_window_size'(2% of the series length): # lagging window size,
124        'future_window_size'(2% of the series length): # future window size,
125        'chunk_size'(2): # chunk size.
126     }
127  2. 'default_detector': # used when other algorithms fails, not meant to be explicitly used.
128  3. 'derivative_detector': # meant to be used when abrupt changes of value are of main interest.
129     {
130        'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages
131                                  # of derivatives.
132     }
133  4. 'exp_avg_detector': # meant to be used when values are in a roughly stationary range.
134                          # and it is the default refine algorithm.
135     {
136        'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages.
137        'lag_window_size'(20% of the series length): # lagging window size.
138        'use_lag_window'(False): # if asserted, a lagging window of size lag_window_size will be used.
139     }
140  ```
141
142It may seem vague for the meanings of some parameters above. Here are some useful insights:
143* [Bitmap](http://alumni.cs.ucr.edu/~ratana/SSDBM05.pdf)
144* [Exponential Moving Avg](http://en.wikipedia.org/wiki/Exponential_smoothing)
145
146The **AnomalyDetector** class has the following public methods:
147* `get_all_scores()`: returns a anomaly score time series of type [TimeSeries](#modules).
148* `get_anomalies()`: return a list of [Anomaly](#modules) objects.
149
150####Correlator
151_class_ luminol.correlator.**Correlator**
152```python
153__init__(self, time_series_a, time_series_b, time_period=None, use_anomaly_score=False,
154algorithm_name=None, algorithm_params=None)
155```
156* `time_series_a`: a time series, for its type, please refer to time_series for AnomalyDetector above.
157* `time_series_b`: a time series, for its type, please refer to time_series for AnomalyDetector above.
158* `time_period(tuple)`: a time period where to correlate the two time series.
159* `use_anomaly_score(bool)`: if asserted, the anomaly scores of the time series will be used to compute correlation coefficient instead of the original data in the time series.
160* `algorithm_name`: if passed, the specific algorithm will be used to calculate correlation coefficient.
161* `algorithm_params`: any additional parameters for the algorithm specified by algorithm_name.
162
163Available algorithms and their additional parameters are:
164
165   ```python
166   1. 'cross_correlator': # when correlate two time series, it tries to shift the series around so that it
167                           # can catch spikes that are slightly apart in time.
168      {
169         'max_shift_seconds'(60): # maximal allowed shift room in seconds,
170         'shift_impact'(0.05): # weight of shift in the shifted coefficient.
171      }
172   ```
173
174The **Correlator** class has the following public methods:
175* `get_correlation_result()`: return a [CorrelationResult](#modules) object.
176* `is_correlated(threshold=0.7)`: if coefficient above the passed in threshold, return a [CorrelationResult](#modules) object. Otherwise, return false.
177
178### Example
1791. Put anomaly scores in a list.
180
181   ```python
182   from luminol.anomaly_detector import AnomalyDetector
183   my_detector = AnomalyDetector(ts)
184   score = my_detector.get_all_scores()
185   anom_score = list()
186   for (timestamp, value) in score.iteritems():
187     t_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(timestamp))
188     anom_score.append([t_str, value])
189   ```
190
1912. Correlate with ts2 on every anomaly.
192
193   ```python
194   from luminol.anomaly_detector import AnomalyDetector
195   from luminol.correlator import Correlator
196
197   my_detector = detector.AnomalyDetector(ts)
198   anomalies = my_detector.get_anomalies()
199   for a in anomalies:
200     time_period = a.get_time_window()
201     my_correlator = Correlator(ts, ts2, time_period)
202     if my_correlator.is_correlated(threshold=0.8):
203       print "ts2 correlate with ts at time period (%d, %d)" % time_period
204   ```
205