1# Admin Tasks for the Chrome Performance Dashboard 2 3## "Dashboard is down" check list 4 5- Is app engine up? If not, just have to sit tight. You can check the status 6here: [https://code.google.com/status/appengine](https://code.google.com/status/appengine) 7- Check the [main app engine dashboard page](https://console.developers.google.com/appengine?project=chromeperf&moduleId=default). 8- Are we over quota? 9- Look at the error rates on the dashboard. 10- Check the task queues. 11- Test data not showing up 12 - Check [/new\_tests](https://chromeperf.appspot.com/new_tests). 13 - Search the logs 14 - Is the test internal-only and the user is logged out? 15 16## Handling stackdriver alerts 17 18We use stackdriver monitoring to check if the dashboard is running as expected. 19When you get an alert mail from stackdriver, you should do the following: 20 21**Understand what alerted**, and find the relevant code. We have two main types 22of alerts: 23 241. **Metric Absence on Custom Metrics**. When we have some piece of code which 25 absolutely must run regularly, we call `utils.TickMonitoringCustomMetric` 26 every time the code completes. If that call is **not** made, it generally 27 means the code failed, and we send an alert. You'll want to search the code 28 for the call to tick the metric with that name, so you can understand where 29 the likely failure is. 302. **Metric Threshold on Task Queue new-points-queue**. When this happens, we are 31 seeing too many errors adding data to the datastore in `add_point_queue.py`. 32 33**Analyze the logs for errors**. Once you have a basic idea what codepath is 34failing, you'll want to look at the logs. There are two main entry points for 35this: 36 371. **[Error Reporting Page](http://go/chromeperf-errors)**. This page 38 lists common errors that have occurred recently, grouped together. You'll 39 want to look out especially for ones marked `NEW ERROR`. Click through to 40 look at callstacks and relevant logs. 412. **[Logs page](http://go/chromeperf-logs)**. This page allows you to search 42 all the logs. You'll want to try and find log entries on the URLs where the 43 problem occurred. 44 ([Logs page help](https://cloud.google.com/logging/docs/view/logs_viewer)) 45 46**File a bug and follow up**. The bug should be labeled `P0`, `Perf Dashboard`, 47`Bug`. If it is clear the problem is with bisect, add that label as well. Reply 48to the email and link the bug, and update the bug with your findings as you 49understand the problem better. 50 51## Scheduled downtime 52 53If it's necessary at some point to have scheduled downtime, announce 54it ahead of time. At least 2 days before the downtime (ideally more), 55announce in these ways: 56 57 1. Send an email to any Chromium perf sheriffs who will be affected, 58 or all perf sheriffs (`perf-sheriffs@chromium.org`). 59 2. Send an email to `chrome-perf-dashboard-announce@google.com`. 60 61If possible, it's probably best to schedule it for Saturday, when usage 62is likely to be relatively low. 63 64## Routine tasks 65 66There are several routine tasks to do to set up the dashboard for a 67user. The official process for this is to file bugs on crbug.com 68with labels: 69 70- `Performance-Dashboard-IPWhitelist` 71- `Performance-Dashboard-BotWhitelist` 72- `Performance-Dashboard-MonitoringRequest` 73 74### Editing sheriff rotations 75 76You can view, create and edit sheriff rotations 77at [/edit\_sheriffs](https://chromeperf.appspot.com/edit_sheriffs). 78 79#### Adding a new sheriff 80 81It’s fine to add a new sheriff rotation any time a team wants alerts 82to go to a new email address. It’s fine to make a temporary sheriff 83rotation for monitoring new tests before they are stable. Here are the 84fields that need to be filled out: 85 86 - **Name**: This is the name of the sheriff 87 rotation. It will be listed in the drop-down 88 at [/alerts](https://chromeperf.appspot.com/alerts). 89 - **Rotation URL**: Some sheriff rotations have a URL for specifying 90 the email of the sheriff. For example, the Chromium Perf Sheriff URL 91 is [http://chromium-build.appspot.com/p/chromium/sheriff\_perf.js](http://chromium-build.appspot.com/p/chromium/sheriff_perf.js). 92 Most sheriff rotations don’t have a URL, and if not it’s fine to leave 93 this blank and just specify an email address. 94 - **Notification Email**: 95 This is usually a mailing list that alerts should go to. However, 96 there’s nothing stopping it from being an individual’s email 97 account. It must be specified if there is no Rotation URL, but it’s 98 optional otherwise. 99 - **Internal-only**: If the tests this sheriff is monitoring are internal-only, 100 or the name of the sheriff rotation is sensitive, please 101 set this to "Yes". If set to "Yes", the sheriff rotation will only 102 show up on the alerts page for users logged in with google.com accounts. 103 - **Summarize Email**: By default, the perf dashboard sends one email 104 for each alert, as soon as it gets the alert. If that will add up to 105 too much mail, setting this to "Yes" will switch to a daily summary. 106 107#### Monitoring tests 108 109After creating a sheriffing rotation, you need to add the individual 110tests to monitor. You do this by clicking on "Set a sheriff for a 111group of tests". It asks for a pattern. Patterns match test paths, 112which are of the form "Master/Bot/test-suite/graph/trace". You can replace 113any part of the test path with a `*` for a wildcard. 114 115The dashboard will list the matching tests before allowing you to apply 116the pattern, so you’ll be able to check if the pattern is correct. 117 118To remove a pattern, click "Remove a sheriff from a group of tests". 119 120If you want to keep alerting on most of the tests in a pattern and 121just disable alerting on a few noisy ones, you can add the "Disable 122Alerting" anomaly threshold config to the noisy tests (see "Modify 123anomaly threshold configs" below). 124 125### Setting up alert threshold configs 126 127The default alert thresholds should work reasonably well for most test 128data, but there are some graphs for which it may not be correct. If 129there are invalid alerts, or the dashboard is not sending alerts when 130you expect them, you may want to modify an alert threshold config. 131 132To edit alert threshold configs, go 133to [/edit\_anomaly\_configs](https://chromeperf.appspot.com/edit_anomaly_configs). 134Add a new config with a descriptive name and a JSON mapping of parameters 135to values. 136 137### Anomaly config debugger page 138 139Start off by using the anomaly threshold debugging 140page: [/debug\_alert](https://chromeperf.appspot.com/debug_alert). The 141page shows the segmentation of the data that was given by the anomaly 142finding algorithm. Based on the documentation, change the config 143parameters to get the alerts where you want them. 144 145### Automatically applying labels to bugs 146 147The dashboard can automatically apply labels to bugs filed on alerts, 148based on which test triggered the alert. This is useful for flagging 149the relevant teams attention. For example, the dashboard automatically 150applies the label "Cr-Blink-JavaScript" to dromaeo regressions, 151which cuts down on a lot of CC-ing by hand. 152 153To make a label automatically applied to a bug, go 154to [/edit\_sheriffs](https://chromeperf.appspot.com/edit_sheriffs) and 155click "Set a bug lable to automatically apply to a group of 156tests". Then type in a pattern as described in "Edit Sheriff 157Rotations -> Monitoring Tests" section above, and type in the bug 158label. You’ll see a list of tests the label will be applied to before 159you confirm. 160 161To remove a label, go 162to [/edit\_sheriffs](https://chromeperf.appspot.com/edit_sheriffs) and 163click "Remove a bug label that automatically applies to a group of 164tests". 165 166### Migrating and renaming data 167 168When a test name changes, it is possible to migrate 169the existing test data to use the new name. You 170can do this by entering a pattern for the test name 171at [/migrate\_test\_names](https://chromeperf.appspot.com/migrate_test_names). 172 173### Allowing data senders 174 175There are two types of allowlists used in the perf dashboard: 176 177The IP allowlist is a list of IP addresses of machines which 178are allowed to post data to /add\_point. This is to prevent 179/add\_point from being spammed. You can add a bot to the IP allowlist 180at [/ip\_whitelist](https://chromeperf.appspot.com/ip_whitelist). If 181you’re seeing 403 errors on your buildbots, the IPs to add are likely 182already in the logs. Note that if you are seeing 500 errors, those are 183not related to the IP allowlist. They are usually caused by an error in 184the JSON data sent by the buildbot. If you can’t tell by looking at 185the JSON data what is going wrong, the easiest thing to do is to add a 186unit test with the JSON to `add_point_test.py` and debug it from there. 187 188The bot allowlist is a list of bot names which are publicly visible. If a 189bot is not on the list, users must be logged into google.com accounts to 190see the data for that bot. You can add or remove a bot from the allowlist 191using the dev console by importing `dashboard.change_internal_only`. 192 193Note that in some places the allowlists may also be referred to as 194whitelists. 195