1--- 2stage: Growth 3group: Product Intelligence 4info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments 5--- 6 7# Snowplow 8 9Snowplow is an enterprise-grade marketing and Product Intelligence platform that tracks how users engage with our website and application. 10 11[Snowplow](https://snowplowanalytics.com) consists of several loosely-coupled sub-systems: 12 13- **Trackers** fire Snowplow events. Snowplow has twelve trackers that cover web, mobile, desktop, server, and IoT. 14- **Collectors** receive Snowplow events from trackers. We use different event collectors that synchronize events to Amazon S3, Apache Kafka, or Amazon Kinesis. 15- **Enrich** cleans raw Snowplow events, enriches them, and puts them into storage. There is a Hadoop-based enrichment process, and a Kinesis-based or Kafka-based process. 16- **Storage** stores Snowplow events. We store the Snowplow events in a flat file structure on S3, and in the Redshift and PostgreSQL databases. 17- **Data modeling** joins event-level data with other data sets, aggregates them into smaller data sets, and applies business logic. This produces a clean set of tables for data analysis. We use data models for Redshift and Looker. 18- **Analytics** are performed on Snowplow events or on aggregate tables. 19 20![snowplow_flow](../img/snowplow_flow.png) 21 22## Enable Snowplow tracking 23 24Tracking can be enabled at: 25 26- The instance level, which enables tracking on both the frontend and backend layers. 27- The user level. User tracking can be disabled on a per user basis. 28 GitLab respects the [Do Not Track](https://www.eff.org/issues/do-not-track) standard, so any user who has enabled the Do Not Track option in their browser is not tracked at a user level. 29 30Snowplow tracking is enabled on GitLab.com, and we use it for most of our tracking strategy. 31 32To enable Snowplow tracking on a self-managed instance: 33 341. On the top bar, select **Menu > Admin**, then select **Settings > General**. 35 Alternatively, go to `admin/application_settings/general` in your browser. 36 371. Expand **Snowplow**. 38 391. Select **Enable Snowplow tracking** and enter your Snowplow configuration information. For example: 40 41 | Name | Value | 42 |--------------------|-------------------------------| 43 | Collector hostname | `your-snowplow-collector.net` | 44 | App ID | `gitlab` | 45 | Cookie domain | `.your-gitlab-instance.com` | 46 471. Select **Save changes**. 48 49## Snowplow request flow 50 51The following example shows a basic request/response flow between the following components: 52 53- Snowplow JS / Ruby Trackers on GitLab.com 54- [GitLab.com Snowplow Collector](https://gitlab.com/gitlab-com/gl-infra/readiness/-/blob/master/library/snowplow/index.md) 55- The GitLab S3 Bucket 56- The GitLab Snowflake Data Warehouse 57- Sisense: 58 59```mermaid 60sequenceDiagram 61 participant Snowplow JS (Frontend) 62 participant Snowplow Ruby (Backend) 63 participant GitLab.com Snowplow Collector 64 participant S3 Bucket 65 participant Snowflake DW 66 participant Sisense Dashboards 67 Snowplow JS (Frontend) ->> GitLab.com Snowplow Collector: FE Tracking event 68 Snowplow Ruby (Backend) ->> GitLab.com Snowplow Collector: BE Tracking event 69 loop Process using Kinesis Stream 70 GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Log raw events 71 GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Enrich events 72 GitLab.com Snowplow Collector ->> GitLab.com Snowplow Collector: Write to disk 73 end 74 GitLab.com Snowplow Collector ->> S3 Bucket: Kinesis Firehose 75 S3 Bucket->>Snowflake DW: Import data 76 Snowflake DW->>Snowflake DW: Transform data using dbt 77 Snowflake DW->>Sisense Dashboards: Data available for querying 78``` 79 80## Structured event taxonomy 81 82Click events must be consistent. If each feature captures events differently, it can be difficult 83to perform analysis. 84 85Each click event provides attributes that describe the event. 86 87| Attribute | Type | Required | Description | 88| --------- | ------- | -------- | ----------- | 89| category | text | true | The page or backend section of the application. Unless infeasible, use the Rails page attribute by default in the frontend, and namespace + class name on the backend. | 90| action | text | true | The action the user takes, or aspect that's being instrumented. The first word must describe the action or aspect. For example, clicks must be `click`, activations must be `activate`, creations must be `create`. Use underscores to describe what was acted on. For example, activating a form field is `activate_form_input`, an interface action like clicking on a dropdown is `click_dropdown`, a behavior like creating a project record from the backend is `create_project`. | 91| label | text | false | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. | 92| property | text | false | Any additional property of the element, or object being acted on. | 93| value | decimal | false | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | 94 95### Examples 96 97| Category* | Label | Action | Property** | Value | 98|-------------|------------------|-----------------------|----------|:-----:| 99| `[root:index]` | `main_navigation` | `click_navigation_link` | `[link_label]` | - | 100| `[groups:boards:show]` | `toggle_swimlanes` | `click_toggle_button` | - | `[is_active]` | 101| `[projects:registry:index]` | `registry_delete` | `click_button` | - | - | 102| `[projects:registry:index]` | `registry_delete` | `confirm_deletion` | - | - | 103| `[projects:blob:show]` | `congratulate_first_pipeline` | `click_button` | `[human_access]` | - | 104| `[projects:clusters:new]` | `chart_options` | `generate_link` | `[chart_link]` | - | 105| `[projects:clusters:new]` | `chart_options` | `click_add_label_button` | `[label_id]` | - | 106 107_* If you choose to omit the category you can use the default._<br> 108_** Use property for variable strings._ 109 110### Reference SQL 111 112#### Last 20 `reply_comment_button` events 113 114```sql 115SELECT 116 session_id, 117 event_id, 118 event_label, 119 event_action, 120 event_property, 121 event_value, 122 event_category, 123 contexts 124FROM legacy.snowplow_structured_events_all 125WHERE 126 event_label = 'reply_comment_button' 127 AND event_action = 'click_button' 128 -- AND event_category = 'projects:issues:show' 129 -- AND event_value = 1 130ORDER BY collector_tstamp DESC 131LIMIT 20 132``` 133 134#### Last 100 page view events 135 136```sql 137SELECT 138 -- page_url, 139 -- page_title, 140 -- referer_url, 141 -- marketing_medium, 142 -- marketing_source, 143 -- marketing_campaign, 144 -- browser_window_width, 145 -- device_is_mobile 146 * 147FROM legacy.snowplow_page_views_30 148ORDER BY page_view_start DESC 149LIMIT 100 150``` 151 152#### Query JSON formatted data 153 154```sql 155SELECT 156 derived_tstamp, 157 contexts:data[0]:data:extra:old_format as CURRENT_FORMAT, 158 contexts:data[0]:data:extra:value as UPDATED_FORMAT 159FROM legacy.snowplow_structured_events_all 160WHERE event_action in ('wiki_format_updated') 161ORDER BY derived_tstamp DESC 162LIMIT 100 163``` 164 165### Web-specific parameters 166 167Snowplow JavaScript adds [web-specific parameters](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol/#Web-specific_parameters) to all web events by default. 168 169## Related topics 170 171- [Snowplow data structure](https://docs.snowplowanalytics.com/docs/understanding-your-pipeline/canonical-event/) 172- [Our Iglu schema registry](https://gitlab.com/gitlab-org/iglu) 173- [List of events used in our codebase (Event Dictionary)](https://metrics.gitlab.com/snowplow.html) 174- [Product Intelligence Guide](https://about.gitlab.com/handbook/product/product-intelligence-guide/) 175- [Service Ping Guide](../service_ping/index.md) 176- [Product Intelligence Direction](https://about.gitlab.com/direction/product-intelligence/) 177- [Data Analysis Process](https://about.gitlab.com/handbook/business-technology/data-team/#data-analysis-process/) 178- [Data for Product Managers](https://about.gitlab.com/handbook/business-technology/data-team/programs/data-for-product-managers/) 179- [Data Infrastructure](https://about.gitlab.com/handbook/business-technology/data-team/platform/infrastructure/) 180