The story of a gradual moved to a(n almost) serverless analytics system and an how to start with BigQuery. A way to create a serverless analytics system in Google Cloud. Live queries on 'huge datasets' were be performed.
13. BigQuery -> SELECT … FROM surprise…
query:
SELECT { * | field_path.* | expression } [ [ AS ] alias ] [ , ... ]
[ FROM from_body
[ WHERE bool_expression ]
[ OMIT RECORD IF bool_expression]
[ GROUP [ EACH ] BY [ ROLLUP ] { field_name_or_alias } [ , ... ] ]
[ HAVING bool_expression ]
[ ORDER BY field_name_or_alias [ { DESC | ASC } ] [, ... ] ]
[ LIMIT n ]
];
from_body:
from_item [, ...] | # Warning: Comma means UNION ALL here
from_item [ join_type ] JOIN [ EACH ] from_item [ ON join_predicate ] |
(FLATTEN({ table_name | (query) }, field_name_or_alias)) |
table_wildcard_function
from_item:
{ table_name | (query) } [ [ AS ] alias ]
join_type:
{ INNER | [ FULL ] [ OUTER ] | RIGHT [ OUTER ] | LEFT [ OUTER ] | CROSS }
14. BigQuery -> SELECT … FROM surprise…
Date-Partitioned Tables [demo]
Table Decorators - See the past w/ @
Table Wildcard Functions - TABLE_DATE_RANGE() & TABLE_QUERY()
Interesting functions
- DateTime » UTC_USEC_TO_DAY/HOUR/MONTH/WEEK/YEAR()
» Shifts a UNIX timestamp in microseconds to the beginning of the period it occurs in.
- JSON_EXTRACT[_SCALAR]()
- URL functions » HOST(), DOMAIN(), TLD()
- REGEXP_MATCH(), REGEXP_EXTRACT()
15. bigquery.defineFunction(
'expandAssetLibrary', // Name of the function exported to SQL
['user_id', 'video_id', 'stage_settings'], // Names of input columns
[ {'name': 'user_id', 'type': 'integer'}, // Output schema
{'name': 'video_id', 'type': 'string'},
{'name': 'asset', 'type': 'string'} ],
expandAssetLibrary // Reference to JavaScript UDF
);
function expandAssetLibrary(row, emit) { …………………………
emit({ user_id: row.user_id, video_id: row.video_id, asset: ss.url.replace('http://', ''));
}
BigQuery -> User Defined Functions
16. BigQuery -> DML
Standard SQL only
Maximum UPDATE/DELETE statements per day per table: 48
Maximum UPDATE/DELETE statements per day per project: 500
Maximum INSERT statements per day per table: 1,000
Maximum INSERT statements per day per project: 10,000
18. BigQuery -> some (Big)Queries
SELECT year, count(1)
FROM [bigquery-public-data:samples.natality]
WHERE father_age < 18
GROUP BY year
ORDER BY year
SELECT year, count(1)
FROM [bigquery-public-data:samples.natality]
WHERE mother_age < 18
GROUP BY year
ORDER BY year
SELECT table_id, row_count, CEIL(size_bytes/POW(1024, 3)) AS gb
FROM [bigquery-public-data:ghcn_m.__TABLES__] ORDER BY gb DESC
19. BigQuery -> some (Big)Queries
SELECT REGEXP_EXTRACT(path, r'.*.(.*)$') AS file_extension,
COUNT(1) AS k
FROM [bigquery-public-data:github_repos.files]
GROUP BY file_extension
ORDER BY k DESC
LIMIT 20
SELECT table_id, row_count,
CEIL(size_bytes/POW(1024, 3)) AS gb
FROM [bigquery-public-data:github_repos.__TABLES__]
ORDER BY gb DESC