1.. _stats-bls:
2
3***********************************
4Box Least Squares (BLS) Periodogram
5***********************************
6
7The "box least squares" (BLS) periodogram [1]_ is a statistical tool used for
8detecting transiting exoplanets and eclipsing binaries in time series
9photometric data. The main interface to this implementation is the
10`~astropy.timeseries.BoxLeastSquares` class.
11
12Mathematical Background
13=======================
14
15The BLS method finds transit candidates by modeling a transit as a periodic
16upside down top hat with four parameters: period, duration, depth, and a
17reference time. In this implementation, the reference time is chosen to be the
18mid-transit time of the first transit in the observational baseline. These
19parameters are shown in the following sketch:
20
21.. plot::
22
23    import numpy as np
24    import matplotlib.pyplot as plt
25
26    period = 6
27    t0 = -3
28    duration = 2.5
29    depth = 0.1
30    x = np.linspace(-5, 5, 50000)
31    y = np.ones_like(x)
32    y[np.abs((x-t0+0.5*period)%period-0.5*period)<0.5*duration] = 1.0 - depth
33    plt.figure(figsize=(7, 4))
34    plt.axvline(t0, color="k", ls="dashed", lw=0.75)
35    plt.axvline(t0+period, color="k", ls="dashed", lw=0.75)
36    plt.axhline(1.0-depth, color="k", ls="dashed", lw=0.75)
37    plt.plot(x, y)
38
39    kwargs = dict(
40        va="center", arrowprops=dict(arrowstyle="->", lw=0.5),
41        bbox={"fc": "w", "ec": "none"},
42    )
43    plt.annotate("period", xy=(t0+period, 1.01), xytext=(t0+0.5*period, 1.01), ha="center", **kwargs)
44    plt.annotate("period", xy=(t0, 1.01), xytext=(t0+0.5*period, 1.01), ha="center", **kwargs)
45    plt.annotate("duration", xy=(t0-0.5*duration, 1.0-0.5*depth), xytext=(t0, 1.0-0.5*depth), ha="center", **kwargs)
46    plt.annotate("duration", xy=(t0+0.5*duration, 1.0-0.5*depth), xytext=(t0, 1.0-0.5*depth), ha="center", **kwargs)
47    plt.annotate("reference time", xy=(t0, 1.0-depth-0.01), xytext=(t0+0.25*duration, 1.0-depth-0.01), ha="left", **kwargs)
48    plt.annotate("depth", xy=(0.0, 1.0), xytext=(0.0, 1.0-0.5*depth), ha="center", rotation=90, **kwargs)
49    plt.annotate("depth", xy=(0.0, 1.0-depth), xytext=(0.0, 1.0-0.5*depth), ha="center", rotation=90, **kwargs)
50
51
52    plt.ylim(1.0 - depth - 0.02, 1.02)
53    plt.xlim(-5, 5)
54    plt.gca().set_yticks([])
55    plt.gca().set_xticks([])
56    plt.ylabel("brightness")
57    plt.xlabel("time")
58
59    # ****
60
61Assuming that the uncertainties on the measured flux are known, independent,
62and Gaussian, the maximum likelihood in-transit flux can be computed as
63
64.. math::
65
66    y_\mathrm{in} = \frac{\sum_\mathrm{in} y_n/{\sigma_n}^2}{\sum_\mathrm{in} 1/{\sigma_n}^2}
67
68where :math:`y_n` are the brightness measurements, :math:`\sigma_n` are the
69associated uncertainties, and both sums are computed over the in-transit data
70points.
71
72Similarly, the maximum likelihood out-of-transit flux is
73
74.. math::
75
76    y_\mathrm{out} = \frac{\sum_\mathrm{out} y_n/{\sigma_n}^2}{\sum_\mathrm{out} 1/{\sigma_n}^2}
77
78where these sums are over the out-of-transit observations. Using these results,
79the log likelihood of a transit model (maximized over depth) at a given period
80:math:`P`, duration :math:`\tau`, and reference time :math:`t_0` is
81
82.. math::
83
84    \log \mathcal{L}(P,\,\tau,\,t_0) =
85    -\frac{1}{2}\,\sum_\mathrm{in}\frac{(y_n-y_\mathrm{in})^2}{{\sigma_n}^2}
86    -\frac{1}{2}\,\sum_\mathrm{out}\frac{(y_n-y_\mathrm{out})^2}{{\sigma_n}^2}
87    + \mathrm{constant}
88
89This equation might be familiar because it is proportional to the "chi
90squared" :math:`\chi^2` for this model and this is a direct consequence of our
91assumption of Gaussian uncertainties.
92
93This :math:`\chi^2` is called the "signal residue" by [1]_, so maximizing the
94log likelihood over duration and reference time is equivalent to computing the
95box least squares spectrum from [1]_.
96
97In practice, this is achieved by finding the maximum likelihood model over a
98grid in duration and reference time as specified by the ``durations`` and
99``oversample`` parameters for the
100`~astropy.timeseries.BoxLeastSquares.power` method.
101
102Behind the scenes, this implementation minimizes the number of required
103calculations by pre-binning the observations onto a fine grid following [1]_
104and [2]_.
105
106Basic Usage
107===========
108
109The transit periodogram takes as input time series observations where the
110timestamps ``t`` and the observations ``y`` (usually brightness) are stored as
111``numpy`` arrays or :class:`~astropy.units.Quantity` objects. If known, error
112bars ``dy`` can also optionally be provided.
113
114Example
115-------
116
117.. EXAMPLE START: Evaluating BLS Periodograms
118
119To evaluate the periodogram for a simulated data set:
120
121>>> import numpy as np
122>>> import astropy.units as u
123>>> from astropy.timeseries import BoxLeastSquares
124>>> np.random.seed(42)
125>>> t = np.random.uniform(0, 20, 2000)
126>>> y = np.ones_like(t) - 0.1*((t%3)<0.2) + 0.01*np.random.randn(len(t))
127>>> model = BoxLeastSquares(t * u.day, y, dy=0.01)
128>>> periodogram = model.autopower(0.2)
129
130The output of the `.astropy.timeseries.BoxLeastSquares.autopower` method
131is a `~astropy.timeseries.BoxLeastSquaresResults` object with several
132useful attributes, the most useful of which are generally the ``period`` and
133``power`` attributes.
134
135This result can be plotted using matplotlib:
136
137>>> import matplotlib.pyplot as plt                  # doctest: +SKIP
138>>> plt.plot(periodogram.period, periodogram.power)  # doctest: +SKIP
139
140.. plot::
141
142    import numpy as np
143    import astropy.units as u
144    import matplotlib.pyplot as plt
145    from astropy.timeseries import BoxLeastSquares
146
147    np.random.seed(42)
148    t = np.random.uniform(0, 20, 2000)
149    y = np.ones_like(t) - 0.1*((t%3)<0.2) + 0.01*np.random.randn(len(t))
150    model = BoxLeastSquares(t * u.day, y, dy=0.01)
151    periodogram = model.autopower(0.2)
152
153    plt.figure(figsize=(8, 4))
154    plt.plot(periodogram.period, periodogram.power, "k")
155    plt.xlabel("period [day]")
156    plt.ylabel("power")
157
158In this figure, you can see the peak at the correct period of three days.
159
160.. EXAMPLE END
161
162Objectives
163==========
164
165By default, the `~astropy.timeseries.BoxLeastSquares.power` method computes the
166log likelihood of the model fit and maximizes over reference time and duration.
167It is also possible to use the signal-to-noise ratio with which the transit
168depth is measured as an objective function.
169
170Example
171-------
172
173.. EXAMPLE START: Transit Search with BoxLeastSquares.power and Signal-to-Noise
174
175To compute the log likelihood of the model fit, call
176`~astropy.timeseries.BoxLeastSquares.power` or
177`~astropy.timeseries.BoxLeastSquares.autopower` with ``objective='snr'`` as
178follows:
179
180>>> model = BoxLeastSquares(t * u.day, y, dy=0.01)
181>>> periodogram = model.autopower(0.2, objective="snr")
182
183.. plot::
184
185    import numpy as np
186    import astropy.units as u
187    import matplotlib.pyplot as plt
188    from astropy.timeseries import BoxLeastSquares
189
190    np.random.seed(42)
191    t = np.random.uniform(0, 20, 2000)
192    y = np.ones_like(t) - 0.1*((t%3)<0.2) + 0.01*np.random.randn(len(t))
193    model = BoxLeastSquares(t * u.day, y, dy=0.01)
194    periodogram = model.autopower(0.2, objective="snr")
195
196    plt.figure(figsize=(8, 4))
197    plt.plot(periodogram.period, periodogram.power, "k")
198    plt.xlabel("period [day]")
199    plt.ylabel("depth S/N")
200
201This objective will generally produce a periodogram that is qualitatively
202similar to the log likelihood spectrum, but it has been used to improve the
203reliability of transit search in the presence of correlated noise.
204
205.. EXAMPLE END
206
207Period Grid
208===========
209
210The transit periodogram is always computed on a grid of periods and the results
211can be sensitive to the sampling. As discussed in [1]_, the performance of the
212transit periodogram method is more sensitive to the period grid than the
213`~astropy.timeseries.LombScargle` periodogram.
214
215This implementation of the transit periodogram includes a conservative heuristic
216for estimating the required period grid that is used by the
217`~astropy.timeseries.BoxLeastSquares.autoperiod` and
218`~astropy.timeseries.BoxLeastSquares.autopower` methods and the details of this
219method are given in the API documentation for
220`~astropy.timeseries.BoxLeastSquares.autoperiod`.
221
222Example
223-------
224
225.. EXAMPLE START: Computing Transit Periodograms on a Grid of Periods
226
227It is possible to provide a specific period grid as follows:
228
229>>> model = BoxLeastSquares(t * u.day, y, dy=0.01)
230>>> periods = np.linspace(2.5, 3.5, 1000) * u.day
231>>> periodogram = model.power(periods, 0.2)
232
233.. plot::
234
235    import numpy as np
236    import astropy.units as u
237    import matplotlib.pyplot as plt
238    from astropy.timeseries import BoxLeastSquares
239
240    np.random.seed(42)
241    t = np.random.uniform(0, 20, 2000)
242    y = np.ones_like(t) - 0.1*((t%3)<0.2) + 0.01*np.random.randn(len(t))
243    model = BoxLeastSquares(t * u.day, y, dy=0.01)
244    periods = np.linspace(2.5, 3.5, 1000) * u.day
245    periodogram = model.power(periods, 0.2)
246
247    plt.figure(figsize=(8, 4))
248    plt.plot(periodogram.period, periodogram.power, "k")
249    plt.xlabel("period [day]")
250    plt.ylabel("power")
251
252However, if the period grid is too coarse, the correct period might be missed.
253
254>>> model = BoxLeastSquares(t * u.day, y, dy=0.01)
255>>> periods = np.linspace(0.5, 10.5, 15) * u.day
256>>> periodogram = model.power(periods, 0.2)
257
258.. plot::
259
260    import numpy as np
261    import astropy.units as u
262    import matplotlib.pyplot as plt
263    from astropy.timeseries import BoxLeastSquares
264
265    np.random.seed(42)
266    t = np.random.uniform(0, 20, 2000)
267    y = np.ones_like(t) - 0.1*((t%3)<0.2) + 0.01*np.random.randn(len(t))
268    model = BoxLeastSquares(t * u.day, y, dy=0.01)
269    periods = np.linspace(0.5, 10.5, 15) * u.day
270    periodogram = model.power(periods, 0.2)
271
272    plt.figure(figsize=(8, 4))
273    plt.plot(periodogram.period, periodogram.power, "k")
274    plt.xlabel("period [day]")
275    plt.ylabel("power")
276
277.. EXAMPLE END
278
279Peak Statistics
280===============
281
282To help in the transit vetting process and to debug problems with candidate
283peaks, the `~astropy.timeseries.BoxLeastSquares.compute_stats` method can be
284used to calculate several statistics of a candidate transit.
285
286Many of these statistics are based on the VARTOOLS package described in [2]_.
287This will often be used as follows to compute stats for the maximum point in
288the periodogram:
289
290>>> model = BoxLeastSquares(t * u.day, y, dy=0.01)
291>>> periodogram = model.autopower(0.2)
292>>> max_power = np.argmax(periodogram.power)
293>>> stats = model.compute_stats(periodogram.period[max_power],
294...                             periodogram.duration[max_power],
295...                             periodogram.transit_time[max_power])
296
297This calculates a dictionary with statistics about this candidate.
298Each entry in this dictionary is described in the documentation for
299`~astropy.timeseries.BoxLeastSquares.compute_stats`.
300
301
302Literature References
303=====================
304
305.. [1] Kovacs, Zucker, & Mazeh (2002), A&A, 391, 369 (arXiv:astro-ph/0206099)
306.. [2] Hartman & Bakos (2016), Astronomy & Computing, 17, 1 (arXiv:1605.06811)
307