README.md
1# bqh.py
2
3`infra_libs/bqh` provides some helper methods for working with BigQuery. It
4is recommended you use this library over using the client API directly as it
5includes common logic for handling protobufs, formatting errors, safe guards,
6and handling edge cases.
7
8[TOC]
9
10# Usage
11
12Create a client:
13
14```
15from google.cloud import bigquery
16from google.oauth2 import service_account
17
18service_account_file = ...
19bigquery_creds = service_account.Credentials.from_service_account_file(
20 service_account_file)
21bigquery_client = bigquery.client.Client(
22 project='example-project', credentials=bigquery_creds)
23```
24
25Send rows:
26
27```
28from infra_libs import bqh
29
30# ExampleRow is a protobuf Message
31rows = [ExampleRow(example_field='1'), ExampleRow(example_field='2')]
32try:
33 bqh.send_rows(bigquery_client, 'example-dataset', 'example-table', rows)
34except bqh.BigQueryInsertError:
35 # handle error
36except bqh.UnsupportedTypeError:
37 # handle error
38```
39
40# Limits
41
42Please see [BigQuery
43docs](https://cloud.google.com/bigquery/quotas#streaminginserts) for the most
44updated limits for streaming inserts. It is expected that the client is
45responsible for ensuring their usage will not exceed these limits through
46infra_libs/biquery usage. A note on maximum rows per request: send_rows()
47batches rows per request, ensuring that no more than 10,000 rows are sent per
48request, and allowing for custom batch size. BigQuery recommends using 500 as a
49practical limit (so we use this as a default), and experimenting with your
50specific schema and data sizes to determine the batch size with the ideal
51balance of throughput and latency for your use case.
52
53# Authentication
54
55Authentication for the Cloud projects happens
56[during client creation](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#authentication-configuration).
57What form this takes depends on the application.
58
59# vPython
60
61infra_libs/bqh is available via vPython as a CIPD package. To update the
62available version, build and upload a new wheel with
63[dockerbuild](../../infra/tools/dockerbuild/README#subcommand_wheel_build).
64
65google-cloud-bigquery is required to create a BigQuery client. Unfortunately,
66google-cloud-bigquery has quite a few dependencies. Here is the Vpython spec you
67need to use infra_libs.bigquery and google-cloud-bigquery:
68
69```
70wheel: <
71 name: "infra/python/wheels/requests-py2_py3"
72 version: "version:2.13.0"
73>
74wheel: <
75 name: "infra/python/wheels/google_api_python_client-py2_py3"
76 version: "version:1.6.2"
77>
78wheel: <
79 name: "infra/python/wheels/six-py2_py3"
80 version: "version:1.10.0"
81>
82wheel: <
83 name: "infra/python/wheels/uritemplate-py2_py3"
84 version: "version:3.0.0"
85>
86wheel: <
87 name: "infra/python/wheels/httplib2-py2_py3"
88 version: "version:0.10.3"
89>
90wheel: <
91 name: "infra/python/wheels/rsa-py2_py3"
92 version: "version:3.4.2"
93>
94wheel: <
95 name: "infra/python/wheels/pyasn1_modules-py2_py3"
96 version: "version:0.0.8"
97>
98wheel: <
99 name: "infra/python/wheels/pyasn1-py2_py3"
100 version: "version:0.2.3"
101>
102wheel: <
103 name: "infra/python/wheels/oauth2client/linux-arm64_cp27_cp27mu"
104 version: "version:3.0.0"
105>
106wheel: <
107 name: "infra/python/wheels/protobuf-py2_py3"
108 version: "version:3.2.0"
109>
110wheel: <
111 name: "infra/python/wheels/infra_libs-py2"
112 version: "version:1.3.0"
113>
114```
115
116# Recommended Monitoring
117
118You can use ts_mon to track upload latency and errors.
119
120```
121from infra_libs import ts_mon
122
123upload_durations = ts_mon.CumulativeDistributionMetric(
124 'example/service/upload/durations',
125 'Time taken to upload an event to bigquery.',
126 [ts_mon.StringField('status')],
127 bucketer=ts_mon.GeometricBucketer(10**0.04),
128 units=ts_mon.MetricsDataUnits.SECONDS)
129
130upload_errors = ts_mon.CounterMetric(
131 'example/service/upload/errors',
132 'Errors encountered upon uploading an event to bigquery.',
133 [ts_mon.StringField('error type')])
134
135with ts_mon.ScopedMeasureTime(upload_durations):
136 try:
137 bqh.send_rows(...)
138 except UnsupportedTypeError:
139 upload_errors.Add(...)
140```
141
README.swarming