Namespace Handling
XML namespaces allow elements and attributes to be uniquely identified, avoiding naming conflicts. The webbed extension provides flexible namespace handling options.
Understanding XML Namespaces
Namespaces are declared with xmlns attributes:
<!-- Default namespace -->
<root xmlns="http://example.com/default">
<item>Content</item>
</root>
<!-- Prefixed namespace -->
<root xmlns:ns="http://example.com/ns">
<ns:item>Content</ns:item>
</root>
Namespace Handling in read_xml
The namespaces parameter controls how namespaces are processed:
Strip (Default)
Removes namespace prefixes from element names:
SELECT * FROM read_xml('data.xml', namespaces := 'strip');
<!-- Input -->
<ns:root xmlns:ns="http://example.com">
<ns:item>Value</ns:item>
</ns:root>
<!-- Results in column: item (not ns:item) -->
Keep
Preserves namespace prefixes in element names:
SELECT * FROM read_xml('data.xml', namespaces := 'keep');
<!-- Input -->
<ns:root xmlns:ns="http://example.com">
<ns:item>Value</ns:item>
</ns:root>
<!-- Results in column: ns:item -->
Expand
Replaces prefixes with full namespace URIs:
SELECT * FROM read_xml('data.xml', namespaces := 'expand');
<!-- Input -->
<ns:root xmlns:ns="http://example.com">
<ns:item>Value</ns:item>
</ns:root>
<!-- Results in column: http://example.com:item -->
Namespace Handling in xml_to_json
The xml_to_json function also supports namespace options:
-- Strip namespaces (default)
SELECT xml_to_json('<ns:root xmlns:ns="http://example.com"><ns:item>Test</ns:item></ns:root>');
-- Result: {"root":{"item":{"#text":"Test"}}}
-- Keep namespace prefixes
SELECT xml_to_json(
'<ns:root xmlns:ns="http://example.com"><ns:item>Test</ns:item></ns:root>',
namespaces := 'keep'
);
-- Result: {"ns:root":{"ns:item":{"#text":"Test"}}}
-- Include xmlns declarations
SELECT xml_to_json(
'<ns:root xmlns:ns="http://example.com"><ns:item>Test</ns:item></ns:root>',
namespaces := 'keep',
xmlns_key := '#xmlns'
);
-- Result: {"ns:root":{"#xmlns":{"ns":"http://example.com"},"ns:item":{"#text":"Test"}}}
XPath and Namespaces
When using XPath extraction functions on namespaced documents, simple paths may not match:
-- This may return empty for namespaced XML:
SELECT xml_extract_text(xml, '//item');
Solution 1: Use namespaces := ‘auto’ (Recommended)
The namespaces := 'auto' parameter automatically handles both declared and undeclared namespace prefixes:
-- Works with declared namespaces
SELECT xml_extract_text(
'<root xmlns:ns="http://example.com"><ns:item>Value</ns:item></root>',
'//ns:item',
namespaces := 'auto'
);
-- Result: [Value]
-- Also works with undeclared prefixes (common in GML, etc.)
SELECT xml_extract_text(
'<root><gml:pos>1 2 3</gml:pos></root>',
'//gml:pos',
namespaces := 'auto'
);
-- Result: [1 2 3]
Solution 2: Use local-name() (Fallback)
For maximum compatibility, use local-name() to match elements regardless of namespace:
-- Match elements regardless of namespace
SELECT xml_extract_text(xml, '//*[local-name()="item"]');
-- With predicates
SELECT xml_extract_text(xml, '//*[local-name()="item" and @id="123"]');
-- Nested elements
SELECT xml_extract_text(xml,
'//*[local-name()="catalog"]/*[local-name()="product"]/*[local-name()="name"]'
);
Discovering Namespaces
Use xml_namespaces to see what namespaces are declared:
SELECT xml_namespaces('<root xmlns="http://default.com" xmlns:ns="http://example.com"/>');
-- Result: [
-- {prefix: "", uri: "http://default.com"},
-- {prefix: "ns", uri: "http://example.com"}
-- ]
Common Patterns
AWS API Responses
AWS XML responses use default namespaces:
<DescribeVolumesResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/">
<requestId>abc123</requestId>
<volumeSet>
<item>...</item>
</volumeSet>
</DescribeVolumesResponse>
-- Works because read_xml strips namespaces by default
SELECT * FROM read_xml('response.xml', record_element := 'item');
-- XPath extraction requires local-name()
SELECT xml_extract_text(xml, '//*[local-name()="requestId"]')
FROM read_xml_objects('response.xml');
SOAP Messages
SOAP uses multiple namespaces:
-- Strip all namespaces
SELECT * FROM read_xml('soap.xml', namespaces := 'strip');
-- Keep for debugging
SELECT * FROM read_xml('soap.xml', namespaces := 'keep');
Atom/RSS Feeds
Feeds often mix namespaces:
SELECT * FROM read_xml('feed.atom', record_element := 'entry');
Best Practices
Use default (strip) for
read_xml- simplifies column namesUse keep when namespace prefixes are semantically important
Use namespaces := ‘auto’ for XPath functions - handles both declared and undeclared prefixes
Use local-name() as a fallback when
'auto'doesn’t workCheck xml_namespaces() when debugging namespace issues