Admin Boundaries in BigQuery Public Datasets

by Vladi ‐ 6 min read

Fetch Countries, States, Zip codes and City Districts Geography from BigQuery Public Datasets

All US Zip Codes Geometries Visualized with Dekart
All US Zip Codes Geometries Visualized with Dekart

When do you need Admin Boundaries in BigQuery?

There are few reasons why you would need Admin Boundaries in your Data Analyses:

  • Your data does not have explicit coordinates or geometries but has references to Countries, States, Counties or Zip codes
  • Admin Boundaries (like city boundary) presents a meaningful way to clip your data for analysis.
  • You want to perform analyses per admin geography (like number of new COVID cases per state/capita)

There are platforms which allow you to enrich your data with Admin Geometries, but then you lose all the power of filtering and aggregating with BigQuery SQL.

When your data is in BigQuery, you then need Admin Boundaries in BigQuery.

You can always acquire an Admin Boundaries datasets and upload it to BigQuery. However, before doing it, check if it already exists for free in BigQuery Public Datasets.

Admin Boundaries from OpenStreetMap dataset in BigQuery

Open Street Maps (OSM) did for cartography the same as Wikipedia did for encyclopedias. You may find its data structure tricky, but with some effort you can extract extensive geometry sets.

Historically, I used Overpass Turbo with its quirky DSL to get subset OSM Data. But now (since 2020), every user of BigQuery can select (and join!) OpenStreetMap as one of the Public Datasets.

Here are some SQL examples of how you can do it with tips and tricks:

World Countries Borders

This example shows how to fetch country borders from BigQuery hosted OSM dataset.

World Countries Borders example from OSM
World Countries Borders example from OSM

view on a map

SELECT
    -- this is one way to get a prop from nested struct
    (
      SELECT value
      FROM UNNEST(all_tags)
      WHERE key = 'ISO3166-1'
    ) AS country_code,
    -- getting int_name or name whatever comes first
    ARRAY(
      SELECT value
      FROM UNNEST(all_tags)
      WHERE key IN('int_name', 'name')
      LIMIT 1
    ) [OFFSET(0)] AS name,
    -- fixing cross antimeridian edges
    ST_ASGEOJSON(
      -- simplifying geometry, full one >100Mb
      ST_SIMPLIFY(geometry, 1000)
    ) AS geometry
FROM
-- OSM dataset in bigquery public data
`bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE ('boundary', 'administrative') IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )
    AND ('admin_level', '2') IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )
    -- make sure name exists
    AND 'name' IN (
      SELECT key
      FROM UNNEST(all_tags)
    )

Explanations:

  • all OSM geometries (multipolygons) are in planet_features_multipolygons table which is available for free for every user of BigQuery.
  • to fetch specific geometries (country borders), we filter by tags stored in nested all_tags structure; specifically we filter by boundary and admin_level tags.
  • countries are big and raw geometries of borders are huge; we use BigQuery function ST_SIMPLIFY to simplify geometry to 1000m precision.
  • to avoid artifacts when visualize with tools such as Kepler.gl or others supporting only projected (flat-map) spatial reference system (SRS), we use ST_ASGEOJSON to achieve correct rendering of edges crossing antimeridian.
Edges cross anti-meridian distortion
Edges cross anti-meridian distortion

US States Borders

Using the same method you can go down to the state level. Here is a query for the fifty states, the District of Columbia, and the five permanently inhabited territories of the United States.

US States Borders including DC and territories
US States Borders including DC and territories

view on a map

SELECT
    -- getting name from all_tags nested struct
    (
      SELECT value
      FROM UNNEST(all_tags)
      WHERE key = 'name'
    ) AS name,
    -- fixing edges crossing antimeridian
    ST_ASGEOJSON(
      -- simplifying geometry for smaller size
      ST_SIMPLIFY(geometry, 1000)
    ) AS geometry
FROM
`bigquery-public-data.geo_openstreetmap.planet_features_multipolygons`
WHERE
    -- filtering for administrative boundaries
    ('boundary', 'administrative') IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )
    -- getting admin_level=4
    AND ('admin_level', '4') IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )
    -- filtering for country code prefixed with US
    AND 'US' IN (
      SELECT SUBSTR(value, 0, 2)
      FROM UNNEST(all_tags)
      WHERE key = 'ISO3166-2'
    )

Explanations:

  • everything is from the query above.
  • filtering is for admin_level=4 and country code prefixed with US.

Berlin City Districts

Can you go beyond state level? Absolutely. You can get city borders and even city district borders. And not just in the US. For this example, I fetched city districts for Berlin, Germany.

US States Borders including DC and territories.
US States Borders including DC and territories.

view on a map

SELECT
-- getting name from nested all_tags
    (
      SELECT value
      FROM UNNEST(all_tags)
      WHERE key = 'name'
      LIMIT 1
    ) AS name,
    -- getting original geometry, no simplification needed
    geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features_multipolygons` AS features
WHERE
    -- getting data from particular import of German Borders
    (
      'source',
      'http://wiki.openstreetmap.org/wiki/Import/Catalogue/Kreisgrenzen_Deutschland_2005'
    ) IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )
    -- only Berlin has boundaries at this level
    AND ('admin_level', '9') IN (
      SELECT (key, value)
      FROM UNNEST(all_tags)
    )

So how does one knows what tags to filter for? I started by googling something like OSM Berlin Mitte (one of the districts) and found something like this on the OSM web site. I then played with SQL on Dekart Playground until I found the right filtering options.

Please Note:

  • Admin Boundaries and maps are subject to regulations in many countries (like showing disputed borders).
  • data in OSM is not necessarily up-to-date and correctly representing borders according to your country laws.

More Admin Geometries in BigQuery

OSM Dataset is not the only source of useful geometry references. There are more like ZIP Codes or Geoip datasets available.

US ZIP Codes

Another common way to reference data is by ZIP codes. BigQuery Public Datasets has a table for it. In this example we fetch ZIP Codes around Seattle.

Fetching US ZIP codes geometries around Seattle
Fetching US ZIP codes geometries around Seattle

view on a map

SELECT zipcode_geom
-- public dataset with US ZIP Codes
FROM `bigquery-public-data.utility_us.zipcode_area`
WHERE
    ST_DWITHIN(
      -- creating point from ZIP Code area centroid
      ST_GeogPoint(longitude, latitude),
      -- Seattle
      ST_GeogPoint(-122.3321, 47.6062),
      -- 10km / 6.2 miles
      10000
    )

Explanations:

  • then main trick here is to use ST_DWITHIN which validates distances between geometries.
  • zipcode_area table has precalculated centroid coordinates longitude, latitude; to be more precise (although much slower), we could use ST_GEOGFROMTEXT(zipcode_geom).

Of course you can fetch all geometries, but mind the size when visualizing. This is Dekart visualization of all US ZIP Codes (40Mb).

US ZIP codes geometries with Dekart
US ZIP codes geometries with Dekart

Summary

  • If your data is in BigQuery, you need your reference geometries (like Admin Boundaries) in BigQuery to leverage the full power of BigQuery SQL (including GIS functions).
  • You can get various useful Reference Geometries from BigQuery Public Datasets to later join them with your private datasets for analysis.
  • All examples have links to Dekart Playground where you can query and visualize GIS data from any BigQuery Public dataset; Dekart is an open-source GIS Visualization tool you can deploy to your cloud and use with your data.