Admin Boundaries in BigQuery Open Datasets

Posted March 28, 2021 by Vladi ‐ 6 min read

Fetch Countries, States, Zip codes and City Districts Geography from BigQuery Open Datasets

All US Zip Codes Geometries Visualized with Dekart
All US Zip Codes Geometries Visualized with Dekart

When you need Admin Boundaries in BigQuery?

There are few reason why you need Admin Boundaries in your Data Analysis:

  • Your data does not have explicit coordinates or geometries but has reference to Countries, States, Counties or Zip codes
  • Admin Boundaries (like city boundary) presents a meaningful way to clip your data for analysis.
  • You want to perform analysis per admin geography (like number of new COVID cases per state/ per capita)

There are platforms which allow you to enrich your data with Admin Geometries, but then you lose all power of filtering and aggregation with BigQuery SQL.

When your data is in BigQuery you need Admin Boundaries in BigQuery.

You always can acquire Admin Boundaries dataset and upload it to BigQuery. However, before doing it, check if it already exists for free in BigQuery Open Datasets.

Admin Boundaries from OpenStreetMap dataset in BigQuery

Open Street Maps (OSM) did for cartography the same as Wikipedia did for encyclopedias. You may find data structure tricky, but with some effort you can extract extensive geometries sets.

Before I used Overpass Turbo with its quirky DSL to get subset OSM Data. But now (since 2020) every user of BigQuery can select (and join!) OpenStreetMap as one of Open Datasets.

These are few SQL examples of how you do it with tips and tricks:

World Countries Borders

This example shows how to fetch country borders from BigQuery hosted OSM dataset

World Countries Borders example from OSM
World Countries Borders example from OSM

view on a map

SELECT
-- this one way to get a prop from nested struct
(
  SELECT value
  FROM UNNEST(all_tags)
  WHERE key = 'ISO3166-1'
) AS country_code,
-- getting int_name or name whatever comes first
ARRAY(
  SELECT value
  FROM UNNEST(all_tags)
  WHERE key IN('int_name', 'name')
  LIMIT 1
) [OFFSET(0)] AS name,
-- fixing cross antimeridian edges
ST_ASGEOJSON(
  -- simplifying geometry, full one >100Mb
  ST_SIMPLIFY(geometry, 1000)
) AS geometry
FROM
-- OSM dataset in bigquery public data
`bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE ('boundary', 'administrative') IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)
AND ('admin_level', '2') IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)
-- make sure name exists
AND 'name' IN (
  SELECT key
  FROM UNNEST(all_tags)
)

Explanations:

  • all OSM geometries (multipolygons) are in planet_features_multipolygons table which is available for free for every user of BigQuery.
  • to fetch specific geometries (country borders) we filter by tags stored in nested all_tags structure; specifically we filter by boundary and admin_level tags
  • countries are big and raw geometries of borders are huge; we use BigQuery function ST_SIMPLIFY to simplify geometry to 1000m precision;
  • to avoid artifacts when visualize with tools like Kepler.gl or other supporting only projected (flat-map) spatial reference system (SRS), we use ST_ASGEOJSON to achieve correct rendering of edges crossing antimeridian;
Edges cross anti-meridian distortion
Edges cross anti-meridian distortion

US States Borders

Using the same method you can go down to the state level. Here ia a query for the fifty states, the District of Columbia, and the five permanently inhabited territories of the United States.

US States Borders including DC and territories
US States Borders including DC and territories

view on a map

SELECT
-- getting name from all_tags nested struct
(
  SELECT value
  FROM UNNEST(all_tags)
  WHERE key = 'name'
) AS name,
-- fixing edges crossing antimeridian
ST_ASGEOJSON(
  -- simplifying geometry for smaller size
  ST_SIMPLIFY(geometry, 1000)
) AS geometry
FROM
`bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE
-- filtering for administrative boundaries
('boundary', 'administrative') IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)
-- getting admin_level=4
AND ('admin_level', '4') IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)
-- filtering for country code prefixed with US
AND 'US' IN (
  SELECT SUBSTR(value, 0, 2)
  FROM UNNEST(all_tags)
  WHERE key = 'ISO3166-2'
)

Explanations:

  • everything’s from query above
  • filtering for admin_level=4 and country code prefixed with US

Berlin City Districts

Can you go beyond state level? Absolutely, you can get city borders and even city district borders. And not just in the US. For this example I fetched city districts for Berlin, Germany.

US States Borders including DC and territories.
US States Borders including DC and territories.

view on a map

SELECT
-- getting name from nested all_tags
(
  SELECT value
  FROM UNNEST(all_tags)
  WHERE key = 'name'
  LIMIT 1
) AS name,
-- getting original geometry, no simplification needed
geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons` AS features
WHERE
-- getting data from particular import of German Borders
(
  'source',
  'http://wiki.openstreetmap.org/wiki/Import/Catalogue/Kreisgrenzen_Deutschland_2005'
) IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)
-- only Berlin has boundaries at this level
AND ('admin_level', '9') IN (
  SELECT (key, value)
  FROM UNNEST(all_tags)
)

So how does one know what tags to filter for? I start by googling something like OSM Berlin Mitte (one of districts) and finding something like this on the OSM web site. Then play with SQL on Dekart Playground until I find the right filtering options.

Please Note:

  • Admin Boundaries and maps are subject of regulation in many countries (like showing disputed borders)
  • data in OSM is not necessarily up-to-date and correctly representing borders according to your country law

More Admin Geometries in BigQuery

OSM Dataset is not the only source of useful reference geometries. There are more like ZIP Codes or Geoip datasets available.

US ZIP Codes

Another common way to reference data is by ZIP codes. BigQuery Open Datasets has a table for it. In this example we fetch ZIP Codes around Seattle.

Fetching US ZIP codes geometries around Seattle
Fetching US ZIP codes geometries around Seattle

view on a map

SELECT zipcode_geom
-- public dataset with US ZIP Codes
FROM `bigquery-public-data.utility_us.zipcode_area`
WHERE
ST_DWITHIN(
  -- creating point from ZIP Code area centroid
  ST_GeogPoint(longitude, latitude),
  -- Seattle
  ST_GeogPoint(-122.3321, 47.6062),
  -- 10km / 6.2 miles
  10000
)

Explanations:

  • main trick here is using ST_DWITHIN which checks distance between geometries
  • zipcode_area table has precalculated centroid coordinates longitude, latitude; more precisely but much slower would be to use ST_GEOGFROMTEXT(zipcode_geom)

Of course you can fetch all geometries but mind the size when visualizing. This is Dekart visualization of all US ZIP Codes (40Mb).

US ZIP codes geometries with Dekart
US ZIP codes geometries with Dekart

Summary

  • If your data is in BigQuery you need your reference geometries (like Admin Boundaries) in BigQuery to use the full power of BigQuery SQL (including GIS functions).
  • You can get various useful Reference Geometries from BigQuery Public Datasets and join then with your private datasets for Analysis
  • All examples have links to Dekart Playground where you can query and visualize GIS data from any BigQuery Public dataset; Dekart is open-source GIS Visualization tool you can deploy to your cloud and use with your data.