Parameters Reference
This page provides a comprehensive reference for all parameters available in the file reading functions.
Common Parameters
These parameters are available for both read_xml and read_html:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
BOOLEAN |
false |
Include a |
|
BOOLEAN |
false |
Skip files that fail to parse instead of raising an error |
|
BIGINT |
16777216 |
Maximum file size in bytes for DOM parsing (16MB default). Files above this use SAX streaming when |
|
BOOLEAN |
true |
Enable automatic schema detection and type inference |
Schema Inference Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
VARCHAR |
NULL |
XPath or tag name identifying which elements become table rows. If not specified, immediate children of root become rows. |
|
VARCHAR |
NULL |
Specify the XML root element name for schema inference |
|
VARCHAR[] |
NULL |
Pre-specify expected column names for better performance |
|
INTEGER |
10 |
Maximum nesting depth to parse (-1 for unlimited, capped at 10) |
|
VARCHAR |
‘struct’ |
How to handle nested elements: |
Type Handling Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
BOOLEAN |
false |
Force all scalar types to VARCHAR. Preserves nested structure (STRUCT, LIST) but converts leaf values. |
|
VARCHAR or VARCHAR[] |
‘auto’ |
Controls date/time detection and parsing. Accepts a format string ( |
|
VARCHAR[] |
[] |
Column names that should always be LIST type, even if they appear only once |
|
VARCHAR or VARCHAR[] |
(none) |
String value(s) to interpret as NULL. Excluded from type inference and converted to NULL during extraction. Case-sensitive. |
|
BOOLEAN |
true |
Enable SAX-based streaming for files exceeding |
|
BOOLEAN |
true |
When true (v2.0.0 default), trims leading/trailing whitespace and normalizes CRLF/CR to LF per XML 1.0 §2.11 but preserves internal whitespace (newlines, tabs, multi-space runs). When false, collapses all internal whitespace runs to a single space (v1.x behavior). |
Attribute Handling Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
VARCHAR |
‘prefix’ |
How to handle attributes: |
|
VARCHAR |
‘@’ |
Prefix added to attribute column names when |
|
VARCHAR |
‘#text’ |
Key name for text content when elements have mixed content |
Namespace Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
VARCHAR |
‘strip’ |
Namespace handling mode: |
Empty Element Handling
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
VARCHAR |
‘object’ |
How to handle empty elements: |
Multi-File Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
BOOLEAN |
false |
Combine columns by name when reading multiple files with different schemas |
Datetime Format Parameter
The datetime_format parameter controls how date, time, and timestamp values are detected and parsed.
Preset names:
Preset |
Format |
Target Type |
|---|---|---|
|
Built-in candidate list (default) |
mixed |
|
Disables temporal detection |
- |
|
|
DATE |
|
|
DATE |
|
|
DATE |
|
|
TIMESTAMP |
|
|
TIMESTAMP |
|
|
TIMESTAMP |
|
|
TIMESTAMPTZ |
|
|
TIME |
|
|
TIME |
How it works:
When datetime_format='auto' (the default), the extension tries a built-in list of common formats against sample values. Formats that fail to parse any sample are eliminated. The first surviving format determines the column type.
When an explicit format or preset is specified, only those formats are tried. If no format matches all samples, the column falls back to VARCHAR.
Note
When auto-detecting, ambiguous date formats (e.g., 03/04/2024) default to US (month-first) ordering, consistent with DuckDB conventions. Use datetime_format='eu' to override.
Interactions with other parameters:
all_varchar=trueoverridesdatetime_format— no temporal detection occurs.datetime_format='none'disables all temporal detection.An explicit
datetime_formatoverridestemporal_detection=falseif both are set.
-- Parse US-format dates
SELECT * FROM read_xml('data.xml', datetime_format='us');
-- Parse European dates
SELECT * FROM read_xml('data.xml', datetime_format='eu');
-- Disable date detection
SELECT * FROM read_xml('data.xml', datetime_format='none');
-- Multiple formats (first surviving format wins)
SELECT * FROM read_xml('data.xml', datetime_format=['%m/%d/%Y', '%Y-%m-%d %H:%M:%S']);
-- Custom format string
SELECT * FROM read_xml('data.xml', datetime_format='%Y/%m/%d');
Examples
Basic Usage
-- Include filenames
SELECT * FROM read_xml('*.xml', filename=true);
-- Skip problematic files
SELECT * FROM read_xml('*.xml', ignore_errors=true);
-- Limit file size
SELECT * FROM read_xml('*.xml', maximum_file_size=1048576); -- 1MB
Schema Control
-- Extract specific records
SELECT * FROM read_xml('feed.xml', record_element := 'item');
-- Force array types
SELECT * FROM read_xml('data.xml', force_list := ['tag', 'category']);
-- Preserve all values as strings
SELECT * FROM read_xml('data.xml', all_varchar := true);
Attribute Handling
-- Use underscore prefix for attributes
SELECT * FROM read_xml('data.xml', attr_prefix := '_');
-- Ignore attributes entirely
SELECT * FROM read_xml('data.xml', attr_mode := 'ignore');
Multi-File Processing
-- Combine files with different schemas
SELECT * FROM read_xml('config/*.xml', union_by_name := true);
-- Process with error tolerance
SELECT * FROM read_xml('data/*.xml',
ignore_errors := true,
union_by_name := true,
filename := true
);