1--- 2stage: none 3group: unassigned 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5--- 6 7# Redis guidelines 8 9## Redis instances 10 11GitLab uses [Redis](https://redis.io) for the following distinct purposes: 12 13- Caching (mostly via `Rails.cache`). 14- As a job processing queue with [Sidekiq](sidekiq_style_guide.md). 15- To manage the shared application state. 16- To store CI trace chunks. 17- As a Pub/Sub queue backend for ActionCable. 18- Rate limiting state storage. 19- Sessions. 20 21In most environments (including the GDK), all of these point to the same 22Redis instance. 23 24On GitLab.com, we use [separate Redis instances](../administration/redis/replication_and_failover.md#running-multiple-redis-clusters). 25See the [Redis SRE guide](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/redis/redis-survival-guide-for-sres.md) 26for more details on our setup. 27 28Every application process is configured to use the same Redis servers, so they 29can be used for inter-process communication in cases where [PostgreSQL](sql.md) 30is less appropriate. For example, transient state or data that is written much 31more often than it is read. 32 33If [Geo](geo.md) is enabled, each Geo node gets its own, independent Redis 34database. 35 36We have [development documentation on adding a new Redis instance](redis/new_redis_instance.md). 37 38## Key naming 39 40Redis is a flat namespace with no hierarchy, which means we must pay attention 41to key names to avoid collisions. Typically we use colon-separated elements to 42provide a semblance of structure at application level. An example might be 43`projects:1:somekey`. 44 45Although we split our Redis usage by purpose into distinct categories, and 46those may map to separate Redis servers in a Highly Available 47configuration like GitLab.com, the default Omnibus and GDK setups share 48a single Redis server. This means that keys should **always** be 49globally unique across all categories. 50 51It is usually better to use immutable identifiers - project ID rather than 52full path, for instance - in Redis key names. If full path is used, the key 53stops being consulted if the project is renamed. If the contents of the key are 54invalidated by a name change, it is better to include a hook that expires 55the entry, instead of relying on the key changing. 56 57### Multi-key commands 58 59We don't use [Redis Cluster](https://redis.io/topics/cluster-tutorial) at the 60moment, but may wish to in the future: [#118820](https://gitlab.com/gitlab-org/gitlab/-/issues/118820). 61 62This imposes an additional constraint on naming: where GitLab is performing 63operations that require several keys to be held on the same Redis server - for 64instance, diffing two sets held in Redis - the keys should ensure that by 65enclosing the changeable parts in curly braces. 66For example: 67 68```plaintext 69project:{1}:set_a 70project:{1}:set_b 71project:{2}:set_c 72``` 73 74`set_a` and `set_b` are guaranteed to be held on the same Redis server, while `set_c` is not. 75 76Currently, we validate this in the development and test environments 77with the [`RedisClusterValidator`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/instrumentation/redis_cluster_validator.rb), 78which is enabled for the `cache` and `shared_state` 79[Redis instances](https://docs.gitlab.com/omnibus/settings/redis.html#running-with-multiple-redis-instances).. 80 81## Redis in structured logging 82 83For GitLab Team Members: There are <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> 84[basic](https://www.youtube.com/watch?v=Uhdj19Dc6vU) and 85<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [advanced](https://youtu.be/jw1Wv2IJxzs) 86videos that show how you can work with the Redis 87structured logging fields on GitLab.com. 88 89Our [structured logging](logging.md#use-structured-json-logging) for web 90requests and Sidekiq jobs contains fields for the duration, call count, 91bytes written, and bytes read per Redis instance, along with a total for 92all Redis instances. For a particular request, this might look like: 93 94| Field | Value | 95| --- | --- | 96| `json.queue_duration_s` | 0.01 | 97| `json.redis_cache_calls` | 1 | 98| `json.redis_cache_duration_s` | 0 | 99| `json.redis_cache_read_bytes` | 109 | 100| `json.redis_cache_write_bytes` | 49 | 101| `json.redis_calls` | 2 | 102| `json.redis_duration_s` | 0.001 | 103| `json.redis_read_bytes` | 111 | 104| `json.redis_shared_state_calls` | 1 | 105| `json.redis_shared_state_duration_s` | 0 | 106| `json.redis_shared_state_read_bytes` | 2 | 107| `json.redis_shared_state_write_bytes` | 206 | 108| `json.redis_write_bytes` | 255 | 109 110As all of these fields are indexed, it is then straightforward to 111investigate Redis usage in production. For instance, to find the 112requests that read the most data from the cache, we can just sort by 113`redis_cache_read_bytes` in descending order. 114 115### The slow log 116 117NOTE: 118There is a [video showing how to see the slow log](https://youtu.be/BBI68QuYRH8) (GitLab internal) 119on GitLab.com 120 121On GitLab.com, entries from the [Redis 122slow log](https://redis.io/commands/slowlog) are available in the 123`pubsub-redis-inf-gprd*` index with the [`redis.slowlog` tag](https://log.gprd.gitlab.net/app/kibana#/discover?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-1d,to:now))&_a=(columns:!(json.type,json.command,json.exec_time_s),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:AWSQX_Vf93rHTYrsexmk,key:json.tag,negate:!f,params:(query:redis.slowlog),type:phrase),query:(match:(json.tag:(query:redis.slowlog,type:phrase))))),index:AWSQX_Vf93rHTYrsexmk)). 124This shows commands that have taken a long time and may be a performance 125concern. 126 127The 128[`fluent-plugin-redis-slowlog`](https://gitlab.com/gitlab-org/fluent-plugin-redis-slowlog) 129project is responsible for taking the `slowlog` entries from Redis and 130passing to Fluentd (and ultimately Elasticsearch). 131 132## Analyzing the entire keyspace 133 134The [Redis Keyspace Analyzer](https://gitlab.com/gitlab-com/gl-infra/redis-keyspace-analyzer) 135project contains tools for dumping the full key list and memory usage of a Redis 136instance, and then analyzing those lists while eliminating potentially sensitive 137data from the results. It can be used to find the most frequent key patterns, or 138those that use the most memory. 139 140Currently this is not run automatically for the GitLab.com Redis instances, but 141is run manually on an as-needed basis. 142 143## Utility classes 144 145We have some extra classes to help with specific use cases. These are 146mostly for fine-grained control of Redis usage, so they wouldn't be used 147in combination with the `Rails.cache` wrapper: we'd either use 148`Rails.cache` or these classes and literal Redis commands. 149 150`Rails.cache` or these classes and literal Redis commands. We prefer 151using `Rails.cache` so we can reap the benefits of future optimizations 152done to Rails. It is worth noting that Ruby objects are 153[marshalled](https://github.com/rails/rails/blob/v6.0.3.1/activesupport/lib/active_support/cache/redis_cache_store.rb#L447) 154when written to Redis, so we need to pay attention to not to store huge 155objects, or untrusted user input. 156 157Typically we would only use these classes when at least one of the 158following is true: 159 1601. We want to manipulate data on a non-cache Redis instance. 1611. `Rails.cache` does not support the operations we want to perform. 162 163### `Gitlab::Redis::{Cache,SharedState,Queues}` 164 165These classes wrap the Redis instances (using 166[`Gitlab::Redis::Wrapper`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/redis/wrapper.rb)) 167to make it convenient to work with them directly. The typical use is to 168call `.with` on the class, which takes a block that yields the Redis 169connection. For example: 170 171```ruby 172# Get the value of `key` from the shared state (persistent) Redis 173Gitlab::Redis::SharedState.with { |redis| redis.get(key) } 174 175# Check if `value` is a member of the set `key` 176Gitlab::Redis::Cache.with { |redis| redis.sismember(key, value) } 177``` 178 179### `Gitlab::Redis::Boolean` 180 181In Redis, every value is a string. 182[`Gitlab::Redis::Boolean`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/redis/boolean.rb) 183makes sure that booleans are encoded and decoded consistently. 184 185### `Gitlab::Redis::HLL` 186 187The Redis [`PFCOUNT`](https://redis.io/commands/pfcount), 188[`PFADD`](https://redis.io/commands/pfadd), and 189[`PFMERGE`](https://redis.io/commands/pfmergge) commands operate on 190HyperLogLogs, a data structure that allows estimating the number of unique 191elements with low memory usage. (In addition to the `PFCOUNT` documentation, 192Thoughtbot's article on [HyperLogLogs in Redis](https://thoughtbot.com/blog/hyperloglogs-in-redis) 193provides a good background here.) 194 195[`Gitlab::Redis::HLL`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/redis/hll.rb) 196provides a convenient interface for adding and counting values in HyperLogLogs. 197 198### `Gitlab::SetCache` 199 200For cases where we need to efficiently check the whether an item is in a group 201of items, we can use a Redis set. 202[`Gitlab::SetCache`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/set_cache.rb) 203provides an `#include?` method that uses the 204[`SISMEMBER`](https://redis.io/commands/sismember) command, as well as `#read` 205to fetch all entries in the set. 206 207This is used by the 208[`RepositorySetCache`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/repository_set_cache.rb) 209to provide a convenient way to use sets to cache repository data like branch 210names. 211