DuckDB Webbed Extension

A comprehensive XML and HTML processing extension for DuckDB that enables SQL-native analysis of structured documents with intelligent schema inference and powerful XPath-based data extraction.

Features

XML & HTML Processing
  • Parse and validate XML/HTML documents

  • Extract data using XPath expressions

  • Convert between XML, HTML, and JSON formats

  • Read files directly into DuckDB tables

Smart Schema Inference
  • Automatically flatten XML documents into relational tables

  • Intelligent type detection (dates, numbers, booleans)

  • Configurable element and attribute handling

Production Ready
  • Built on libxml2 for robust parsing

  • Comprehensive error handling

  • Memory-safe RAII implementation

  • 58 test suites with 1901 assertions

Indices and tables