Namespace Handling
==================
XML namespaces allow elements and attributes to be uniquely identified, avoiding naming conflicts. The webbed extension provides flexible namespace handling options.
Understanding XML Namespaces
----------------------------
Namespaces are declared with ``xmlns`` attributes:
.. code-block:: xml
- Content
Content
Namespace Handling in read_xml
------------------------------
The ``namespaces`` parameter controls how namespaces are processed:
Strip (Default)
~~~~~~~~~~~~~~~
Removes namespace prefixes from element names:
.. code-block:: sql
SELECT * FROM read_xml('data.xml', namespaces := 'strip');
.. code-block:: xml
Value
Keep
~~~~
Preserves namespace prefixes in element names:
.. code-block:: sql
SELECT * FROM read_xml('data.xml', namespaces := 'keep');
.. code-block:: xml
Value
Expand
~~~~~~
Replaces prefixes with full namespace URIs:
.. code-block:: sql
SELECT * FROM read_xml('data.xml', namespaces := 'expand');
.. code-block:: xml
Value
Namespace Handling in xml_to_json
---------------------------------
The ``xml_to_json`` function also supports namespace options:
.. code-block:: sql
-- Strip namespaces (default)
SELECT xml_to_json('Test');
-- Result: {"root":{"item":{"#text":"Test"}}}
-- Keep namespace prefixes
SELECT xml_to_json(
'Test',
namespaces := 'keep'
);
-- Result: {"ns:root":{"ns:item":{"#text":"Test"}}}
-- Include xmlns declarations
SELECT xml_to_json(
'Test',
namespaces := 'keep',
xmlns_key := '#xmlns'
);
-- Result: {"ns:root":{"#xmlns":{"ns":"http://example.com"},"ns:item":{"#text":"Test"}}}
XPath and Namespaces
--------------------
When using XPath extraction functions on namespaced documents, simple paths may not match:
.. code-block:: sql
-- This may return empty for namespaced XML:
SELECT xml_extract_text(xml, '//item');
**Solution 1: Use namespaces := 'auto' (Recommended)**
The ``namespaces := 'auto'`` parameter automatically handles both declared and undeclared namespace prefixes:
.. code-block:: sql
-- Works with declared namespaces
SELECT xml_extract_text(
'Value',
'//ns:item',
namespaces := 'auto'
);
-- Result: [Value]
-- Also works with undeclared prefixes (common in GML, etc.)
SELECT xml_extract_text(
'1 2 3',
'//gml:pos',
namespaces := 'auto'
);
-- Result: [1 2 3]
**Solution 2: Use local-name() (Fallback)**
For maximum compatibility, use ``local-name()`` to match elements regardless of namespace:
.. code-block:: sql
-- Match elements regardless of namespace
SELECT xml_extract_text(xml, '//*[local-name()="item"]');
-- With predicates
SELECT xml_extract_text(xml, '//*[local-name()="item" and @id="123"]');
-- Nested elements
SELECT xml_extract_text(xml,
'//*[local-name()="catalog"]/*[local-name()="product"]/*[local-name()="name"]'
);
Discovering Namespaces
----------------------
Use ``xml_namespaces`` to see what namespaces are declared:
.. code-block:: sql
SELECT xml_namespaces('');
-- Result: [
-- {prefix: "", uri: "http://default.com"},
-- {prefix: "ns", uri: "http://example.com"}
-- ]
Common Patterns
---------------
AWS API Responses
~~~~~~~~~~~~~~~~~
AWS XML responses use default namespaces:
.. code-block:: xml
abc123
- ...
.. code-block:: sql
-- Works because read_xml strips namespaces by default
SELECT * FROM read_xml('response.xml', record_element := 'item');
-- XPath extraction requires local-name()
SELECT xml_extract_text(xml, '//*[local-name()="requestId"]')
FROM read_xml_objects('response.xml');
SOAP Messages
~~~~~~~~~~~~~
SOAP uses multiple namespaces:
.. code-block:: sql
-- Strip all namespaces
SELECT * FROM read_xml('soap.xml', namespaces := 'strip');
-- Keep for debugging
SELECT * FROM read_xml('soap.xml', namespaces := 'keep');
Atom/RSS Feeds
~~~~~~~~~~~~~~
Feeds often mix namespaces:
.. code-block:: sql
SELECT * FROM read_xml('feed.atom', record_element := 'entry');
Best Practices
--------------
1. **Use default (strip)** for ``read_xml`` - simplifies column names
2. **Use keep** when namespace prefixes are semantically important
3. **Use namespaces := 'auto'** for XPath functions - handles both declared and undeclared prefixes
4. **Use local-name()** as a fallback when ``'auto'`` doesn't work
5. **Check xml_namespaces()** when debugging namespace issues