Kepler.gl + DuckDB (WASM): configuration and custom SQL UI

by Vladi ‐ 4 min read

This document describes the minimal steps to:

enable DuckDB-WASM inside a Kepler.gl app, and
run DuckDB SQL from a custom UI (your own editor + Run button) so the result is added to Kepler as a new dataset and auto-creates a layer.

1) Install dependencies

npm install @kepler.gl/duckdb

2) Configure Kepler (required)

Call initApplicationConfig once before the map is used:

import { initApplicationConfig } from '@kepler.gl/utils'
import { KeplerGlDuckDbTable, DuckDBWasmAdapter } from '@kepler.gl/duckdb'

initApplicationConfig({
  // Required: makes dataset ingestion and “add Arrow result to map” follow the DuckDB path.
  table: KeplerGlDuckDbTable,

  // Required: DuckDB-WASM adapter used by your custom editor.
  database: new DuckDBWasmAdapter({
    debug: false,
    config: {
      // Recommended for visualization: avoid bigint/decimal type issues downstream.
      query: { castBigIntToDouble: true }
    }
  }),

  // Recommended: DuckDB execution is async.
  useArrowProgressiveLoading: false
})

Notes:

DuckDBWasmAdapter runs DuckDB in a Web Worker and provides connect() + query().
KeplerGlDuckDbTable controls how datasets are ingested into DuckDB and read back as Arrow.

Under the hood (short)

DuckDB runs in-browser via @duckdb/duckdb-wasm inside a Web Worker (the adapter selects the right WASM bundle and initializes the worker).
Kepler switches its dataset storage path by swapping the table class to KeplerGlDuckDbTable via initApplicationConfig.
Query results are passed to Kepler as Apache Arrow (info.format = 'arrow') to avoid materializing large result sets into JS arrays.
The SQL Panel path typically casts types for compatibility (e.g. geometry → WKB; bigint/decimal → double) and attaches GeoArrow metadata for geometry columns.

`KeplerGlDuckDbTable` (what it actually changes)

On dataset import/update, it creates (or replaces) a DuckDB table named after the dataset label.
It can ingest rows, GeoJSON (via ST_READ(..., keep_wkb=TRUE)), or Arrow (via insertArrowTable), and it loads DuckDB’s spatial extension when needed.
After ingestion, it runs a generated SELECT that casts types for Kepler (geometry → WKB, bigint/decimal → double, etc.) and returns an Arrow table.
Kepler UI then uses Arrow fields + cols (and no JS rows), while DuckDB remains the source-of-truth for SQL.

3) Run DuckDB SQL from a custom UI and add the result to Kepler

Kepler’s built-in SQL Panel uses this approach:

run SQL (optionally multiple statements)
ensure the final statement is a SELECT
materialize it into a temp table
re-select with type casts (geometry → WKB; bigint/decimal → double)
dispatch addDataToMap with info.format = 'arrow'

Minimal implementation:

import { getApplicationConfig } from '@kepler.gl/utils'
import { addDataToMap } from '@kepler.gl/actions'
import { arrowSchemaToFields } from '@kepler.gl/processors'
import { generateHashId } from '@kepler.gl/common-utils'
import {
  getDuckDBColumnTypes,
  getDuckDBColumnTypesMap,
  castDuckDBTypesForKepler,
  setGeoArrowWKBExtension,
  splitSqlStatements,
  removeSQLComments,
  checkIsSelectQuery
} from '@kepler.gl/duckdb'

const TEMP_TABLE = 'temp_keplergl_table'

export async function runDuckDbSqlAndAddToMap ({ sql, label, dispatch }) {
  const db = getApplicationConfig().database
  if (!db) throw new Error('DuckDB is not configured')

  const cleaned = removeSQLComments(sql)
  if (!cleaned.trim()) throw new Error('Query is empty')

  const c = await db.connect()
  try {
    const statements = splitSqlStatements(cleaned)
    let arrowTable = null
    let tableDuckDBTypes = {}

    for (let i = 0; i < statements.length; i++) {
      const statement = statements[i]
      const isLast = i === statements.length - 1

      if (isLast && (await checkIsSelectQuery(c, statement))) {
        await c.query(`CREATE OR REPLACE TABLE "${TEMP_TABLE}" AS ${statement}`)

        const duckDbColumns = await getDuckDBColumnTypes(c, TEMP_TABLE)
        tableDuckDBTypes = getDuckDBColumnTypesMap(duckDbColumns)

        const adjustedSql = castDuckDBTypesForKepler(TEMP_TABLE, duckDbColumns)
        arrowTable = await c.query(adjustedSql)
        setGeoArrowWKBExtension(arrowTable, duckDbColumns)

        await c.query(`DROP TABLE "${TEMP_TABLE}"`)
      } else {
        await c.query(statement)
      }
    }

    if (!arrowTable) {
      throw new Error('Last statement must be a SELECT')
    }

    const fields = arrowSchemaToFields(arrowTable, tableDuckDBTypes)

    dispatch(addDataToMap({
      datasets: [{
        info: { id: generateHashId(), label: label || 'query_result', format: 'arrow' },
        data: { fields, rows: arrowTable }
      }],
      options: {
        // Default is true in Kepler; include explicitly if you depend on it.
        autoCreateLayers: true
      }
    }))
  } finally {
    await c.close()
  }
}

What happens after addDataToMap:

Kepler adds the dataset to state.
If options.autoCreateLayers !== false, Kepler auto-creates a layer based on columns.

4) Pitfalls / workarounds

4.1 Arrow version mismatch

If your app bundle ends up with multiple apache-arrow versions, Arrow vector instances may become incompatible at runtime.

Workaround: pin apache-arrow to one version at the application package.json.

"overrides": {
  "apache-arrow": "17.0.0"
}

4.2 Temp table collisions

If you allow concurrent runs, fixed temp table names (like temp_keplergl_table) can collide.

Workaround: include a run-scoped suffix in the temp table name.

4.3 Avoid Arrow → JS row materialization for large results

Avoid converting Arrow results to JS row arrays for large datasets. Prefer the Arrow-native payload (format: 'arrow') shown above.