• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

README.mdH A D02-Dec-20216.2 KiB117101

create.cH A D02-Dec-202179.7 KiB2,3361,715

create.hH A D02-Dec-2021750 2211

insert.cH A D02-Dec-202116 KiB484319

insert.hH A D02-Dec-2021753 2111

invalidation.cH A D02-Dec-202152.5 KiB1,5601,039

invalidation.hH A D02-Dec-20213 KiB7752

invalidation_threshold.cH A D02-Dec-202110.7 KiB317206

invalidation_threshold.hH A D02-Dec-2021882 2212

materialize.cH A D02-Dec-202110.3 KiB313248

materialize.hH A D02-Dec-20211.1 KiB4630

options.cH A D02-Dec-20212.8 KiB9070

options.hH A D02-Dec-2021546 178

refresh.cH A D02-Dec-202130.3 KiB868569

refresh.hH A D02-Dec-20211.1 KiB3423

README.md

1# Continuous Aggregates #
2
3A continuous aggregate is a special kind of materialized view for
4aggregates that can be partially and continuously refreshed, either
5manually or automated by a policy that runs in the background. Unlike
6a regular materialized view, a continuous aggregate doesn't require
7complete re-materialization on every refresh. Instead, it is possible
8to refresh a subset of the continuous aggregate at relatively low
9cost, thus enabling continuous aggregation as new data is written or
10old data is updated and/or backfilled.
11
12To enable continuous aggregation, a continuous aggregate stores
13partial aggregations for every time bucket in an internal
14hypertable. The advantage of this configuration is that each time
15bucket can be recomputed individually, without requiring updates to
16other buckets, and buckets can be combined to form more granular
17aggregates (e.g., hourly buckets can be combined to form daily
18buckets). Finalization of the partial buckets happens automatically at
19query time. Although such finalization gives slightly higher querying
20times, it is offset by more efficient refreshes that only recompute
21the buckets that have been "invalidated" by changes in the raw data.
22
23A continuous aggregate policy automates the refreshing, allowing the
24aggregate to stay up-to-date without manual intervention. A policy can
25be configured to only refresh the most recent data (e.g., just the
26last hour's worth of data) or ensure that the continuous aggregate is
27always up-to-date with the underlying source data. Policies that focus
28on recent data allow older parts of the continuous aggregate to stay
29the same or be governed by manual refreshes.
30
31## Bookkeeping and Internal State ##
32
33TimescaleDB does bookkeeping for each continuous aggregate to know
34which buckets of the aggregates require refreshing. Whenever a
35modification happens to the source data, an invalidation for the
36modified region is written to an invalidation log. However,
37invalidations are not written after the *invalidation threshold*,
38which tracks the latest bucket materialized thus far. This threshold
39allows write amplification to be kept to a minimum by not writing
40invalidations for "hot" time buckets that are assumed to still have
41data being written to them.
42
43Thus, to store, maintain, and query aggregations, continuous
44aggregates consist of the following objects:
45
461. A user view, which queries and finalizes the aggregations and is
47   also the object that users interact with.
482. A partial view, which is used to materialize new data.
493. A direct view, which holds the original query that users specified.
504. An internal materialization hypertable, containing the materialized
51   data as partial aggregates for each time bucket.
525. An invalidation threshold, which is a timestamp that tracks the
53   latest materialization. Invalidations that occur before this
54   timestamp will be logged, while invalidations after it will not be
55   logged.
566. A trigger on the source hypertable that writes invalidations to the
57   hypertable invalidation log at transaction end, based on INSERT,
58   UPDATE, and DELETE statements that mutate the data.
597. A hypertable invalidation log that tracks invalidated regions of
60   data for each hypertable. Entries in this log contain time ranges
61   that need to be re-materialized across all the hypertable's
62   continuous aggregates.
638. A materialization invalidation log. Once a refresh runs on a given
64   continuous aggregate, this log tracks how invalidations from the
65   hypertable invalidation log are processed against the refresh
66   window for the refreshed continuous aggregate. Thus, a single
67   invalidation in the hypertable invalidation log becomes one entry
68   per continuous aggregate in the materialization invalidation log.
69
70## The materialized hypertable ##
71
72The materialized hypertable does not store the aggregate's output, but
73rather the partial aggregate state. For instance, in case of an
74average, each bucket stores the sum and count in an internal binary
75form. The partial aggregates are what gives continuous aggregates
76flexibility; buckets can be individually updated and multiple partial
77aggregates can be combined to form new partials. Future enhancements
78may allow aggregating at different time resolutions using the the same
79underlying continuous aggregate.
80
81## The Invalidation Log and Threshold ##
82
83Mutating transactions must record their mutations in the invalidation
84log, so that a refresh knows to re-materialize the invalidated
85range. This happens by installing a trigger on the source hypertable
86when the first continuous aggregate on that hypertable is created.
87
88To reduce the extra writes by the trigger, only one invalidation range
89(lowest and highest modified value) is written at the end of a
90mutating transaction. As a result, a refresh might materialize more
91data than necessary, but the insert incurs a smaller overhead
92instead. Write amplification is further reduced by never writing
93invalidations after the invalidation threshold, which can be
94configured to lag behind the time bucket that sees the most writes.
95
96Whenever a refresh occurs across a time range that is newer than the
97current invalidation threshold, the threshold must first be moved to
98the end of the refreshed region so that new invalidations are recorded
99in the region after the refresh. However, mutations in the refreshed
100region can also happen concurrently with the refresh, so, in order to
101not lose any invalidations, the invalidation threshold must be moved
102in its own transaction before the new region is materialized.
103
104Thus, every refresh may happen across two transactions; first one that
105moves the invalidation threshold (if necessary) and a second one that
106does the actual materialization of new data.
107
108The second transaction of the refresh will only materialize regions
109that are recorded as invalid in the invalidation log. Thus, the
110initial state of a continuous aggregate is to have an entry in the
111invalidation log that invalidates the entire range of the
112aggregate. During the refresh, the log is processed and invalidations
113are cut against the given refresh window, leaving only invalidation
114entries that are outside the refresh window. Subsequently, if the
115refresh window does not match any invalidations, there is nothing to
116refresh either.
117