Kepler.gl + DuckDB (WASM): configuration and custom SQL UI
by Vladi ‐ 4 min read
This document describes the minimal steps to:
- enable DuckDB-WASM inside a Kepler.gl app, and
- run DuckDB SQL from a custom UI (your own editor + Run button) so the result is added to Kepler as a new dataset and auto-creates a layer.
1) Install dependencies
npm install @kepler.gl/duckdb
2) Configure Kepler (required)
Call initApplicationConfig once before the map is used:
import { initApplicationConfig } from '@kepler.gl/utils'
import { KeplerGlDuckDbTable, DuckDBWasmAdapter } from '@kepler.gl/duckdb'
initApplicationConfig({
// Required: makes dataset ingestion and “add Arrow result to map” follow the DuckDB path.
table: KeplerGlDuckDbTable,
// Required: DuckDB-WASM adapter used by your custom editor.
database: new DuckDBWasmAdapter({
debug: false,
config: {
// Recommended for visualization: avoid bigint/decimal type issues downstream.
query: { castBigIntToDouble: true }
}
}),
// Recommended: DuckDB execution is async.
useArrowProgressiveLoading: false
})
Notes:
DuckDBWasmAdapterruns DuckDB in a Web Worker and providesconnect()+query().KeplerGlDuckDbTablecontrols how datasets are ingested into DuckDB and read back as Arrow.
Under the hood (short)
- DuckDB runs in-browser via
@duckdb/duckdb-wasminside a Web Worker (the adapter selects the right WASM bundle and initializes the worker). - Kepler switches its dataset storage path by swapping the table class to
KeplerGlDuckDbTableviainitApplicationConfig. - Query results are passed to Kepler as Apache Arrow (
info.format = 'arrow') to avoid materializing large result sets into JS arrays. - The SQL Panel path typically casts types for compatibility (e.g. geometry → WKB; bigint/decimal → double) and attaches GeoArrow metadata for geometry columns.
KeplerGlDuckDbTable (what it actually changes)
- On dataset import/update, it creates (or replaces) a DuckDB table named after the dataset label.
- It can ingest rows, GeoJSON (via
ST_READ(..., keep_wkb=TRUE)), or Arrow (viainsertArrowTable), and it loads DuckDB’sspatialextension when needed. - After ingestion, it runs a generated
SELECTthat casts types for Kepler (geometry → WKB, bigint/decimal → double, etc.) and returns an Arrow table. - Kepler UI then uses Arrow
fields + cols(and no JSrows), while DuckDB remains the source-of-truth for SQL.
3) Run DuckDB SQL from a custom UI and add the result to Kepler
Kepler’s built-in SQL Panel uses this approach:
- run SQL (optionally multiple statements)
- ensure the final statement is a
SELECT - materialize it into a temp table
- re-select with type casts (geometry → WKB; bigint/decimal → double)
- dispatch
addDataToMapwithinfo.format = 'arrow'
Minimal implementation:
import { getApplicationConfig } from '@kepler.gl/utils'
import { addDataToMap } from '@kepler.gl/actions'
import { arrowSchemaToFields } from '@kepler.gl/processors'
import { generateHashId } from '@kepler.gl/common-utils'
import {
getDuckDBColumnTypes,
getDuckDBColumnTypesMap,
castDuckDBTypesForKepler,
setGeoArrowWKBExtension,
splitSqlStatements,
removeSQLComments,
checkIsSelectQuery
} from '@kepler.gl/duckdb'
const TEMP_TABLE = 'temp_keplergl_table'
export async function runDuckDbSqlAndAddToMap ({ sql, label, dispatch }) {
const db = getApplicationConfig().database
if (!db) throw new Error('DuckDB is not configured')
const cleaned = removeSQLComments(sql)
if (!cleaned.trim()) throw new Error('Query is empty')
const c = await db.connect()
try {
const statements = splitSqlStatements(cleaned)
let arrowTable = null
let tableDuckDBTypes = {}
for (let i = 0; i < statements.length; i++) {
const statement = statements[i]
const isLast = i === statements.length - 1
if (isLast && (await checkIsSelectQuery(c, statement))) {
await c.query(`CREATE OR REPLACE TABLE "${TEMP_TABLE}" AS ${statement}`)
const duckDbColumns = await getDuckDBColumnTypes(c, TEMP_TABLE)
tableDuckDBTypes = getDuckDBColumnTypesMap(duckDbColumns)
const adjustedSql = castDuckDBTypesForKepler(TEMP_TABLE, duckDbColumns)
arrowTable = await c.query(adjustedSql)
setGeoArrowWKBExtension(arrowTable, duckDbColumns)
await c.query(`DROP TABLE "${TEMP_TABLE}"`)
} else {
await c.query(statement)
}
}
if (!arrowTable) {
throw new Error('Last statement must be a SELECT')
}
const fields = arrowSchemaToFields(arrowTable, tableDuckDBTypes)
dispatch(addDataToMap({
datasets: [{
info: { id: generateHashId(), label: label || 'query_result', format: 'arrow' },
data: { fields, rows: arrowTable }
}],
options: {
// Default is true in Kepler; include explicitly if you depend on it.
autoCreateLayers: true
}
}))
} finally {
await c.close()
}
}
What happens after addDataToMap:
- Kepler adds the dataset to state.
- If
options.autoCreateLayers !== false, Kepler auto-creates a layer based on columns.
4) Pitfalls / workarounds
4.1 Arrow version mismatch
If your app bundle ends up with multiple apache-arrow versions, Arrow vector instances may become incompatible at runtime.
Workaround: pin apache-arrow to one version at the application package.json.
"overrides": {
"apache-arrow": "17.0.0"
}
4.2 Temp table collisions
If you allow concurrent runs, fixed temp table names (like temp_keplergl_table) can collide.
Workaround: include a run-scoped suffix in the temp table name.
4.3 Avoid Arrow → JS row materialization for large results
Avoid converting Arrow results to JS row arrays for large datasets. Prefer the Arrow-native payload (format: 'arrow') shown above.