DuckDB Webbed Extension
A comprehensive XML and HTML processing extension for DuckDB that enables SQL-native analysis of structured documents with intelligent schema inference and powerful XPath-based data extraction.
Getting Started
- Installation
- Quick Start
- Loading the Extension
- Reading XML Files
- Reading HTML Files
- Extracting Data with XPath
- Working with Document Objects
- Converting Between Formats
- Extracting Links and Images from HTML
- Parsing XML/HTML Strings
- Controlling Date/Time Parsing
- Handling NULL Values
- Processing Large Files
- Extracting HTML Tables
- Next Steps
Function Reference
Advanced Topics
About
Features
- XML & HTML Processing
Parse and validate XML/HTML documents
Extract data using XPath expressions
Convert between XML, HTML, and JSON formats
Read files directly into DuckDB tables
- Smart Schema Inference
Automatically flatten XML documents into relational tables
Intelligent type detection (dates, numbers, booleans)
Configurable element and attribute handling
- Production Ready
Built on libxml2 for robust parsing
Comprehensive error handling
Memory-safe RAII implementation
58 test suites with 1901 assertions