timescaledb-2.5.1/src/bgw

# Background worker jobs

TimescaleDB needs to run multiple background jobs. This module
implements a simple scheduler so that jobs inserted into a jobs table
can be run on a schedule. Each database in an instance runs it's own
scheduler because different databases may run different TimescaleDB
extension versions which may require different scheduler logic.

## Schedules

The scheduler allows you to set a `schedule_interval` for every job.
That defines the interval the scheduler will wait after a job finishes to start
it again, if the job is successful. If the job fails, the scheduler uses `retry_period`
in an exponential backoff to decide when to run the job again.

## Design

The scheduler itself is a background job that continuously runs and waits
for a time when jobs need to be scheduled. It then launches jobs as new
background workers that it controls through the background worker handle.

Aggregate statistics about a job are kept in the job stat catalog table.
These statistics include the start and finish times of the last run of the job
as well as whether or not the job succeeded. The `next_start` is used to
figure out when next to run a job after a scheduler is restarted.

The statistics table also tracks consecutive failures and crashes for the job
which are used for calculating the exponential backoff after a crash or failure
(which is used to set the `next_start` after the crash/failure). Note also that
there is a minimum time after the database scheduler starts up and a crashed job
is restarted. This is to allow the operator enough time to disable the job
if needed.

Note that the number of crashes is an overestimate of the actual number of crashes
for a job. This is so that we are conservative and never miss a crash and fail to
use the appropriate backoff logic. There is some complexity
in ensuring that all crashes are counted. A crash in Postgres causes /all/
processes to quit immediately therefore we cannot write anything to the database once
any process has crashed. Thus, we must be able to deduce that a crash occurred
from a commit that happened before any crash. We accomplish
this by committing a changes to the stats table before a job starts and
undoing the change after it finishes. If a job crashed, it will be left
in an intermediate state from which we deduce that it could have been the
crashing process.

## Scheduler State Machine

The scheduler implements a state machine for each job.
Each job starts in the SCHEDULED state. As soon as a job starts
it enters the STARTING state. If the scheduler determines the
job should be terminated (e.g. it has reached a timeout), it moves
the job to a TERMINATING state. Once a background worker has for
a job has stopped, the job returns to the SCHEDULED state.
The states and associated transitions are as follows.

```
      +---------+         +--------+
+---> |SCHEDULED+-------> |DISABLED|
|     +----+----+         +--------+
|          |
|          |
|          v
|      +---+----+
+<-----+STARTING|
|      +---+----+
|          |
|          |
|          v
|      +---+-------+
+<-----+TERMINATING|
       +-----------+
```
## Limitations
This first implementation has two limitations:

- The list of jobs to be run is read from the database when the scheduler is first started.
We do not update this list if the jobs table changes.
- There is no prioritization for when to run jobs.
Name		Date	Size	#Lines	LOC
..		03-May-2022	-
README.md	H A D	02-Dec-2021	3.3 KiB	80	66
job.c	H A D	02-Dec-2021	25.2 KiB	906	649
job.h	H A D	02-Dec-2021	2.3 KiB	62	41
job_stat.c	H A D	02-Dec-2021	16.8 KiB	540	398
job_stat.h	H A D	02-Dec-2021	1.3 KiB	42	26
launcher_interface.c	H A D	02-Dec-2021	1.6 KiB	64	45
launcher_interface.h	H A D	02-Dec-2021	570	17	9
scheduler.c	H A D	02-Dec-2021	24 KiB	906	613
scheduler.h	H A D	02-Dec-2021	1.2 KiB	40	22
timer.c	H A D	02-Dec-2021	2.7 KiB	128	93
timer.h	H A D	02-Dec-2021	692	31	18