Namespace Handling ================== XML namespaces allow elements and attributes to be uniquely identified, avoiding naming conflicts. The webbed extension provides flexible namespace handling options. Understanding XML Namespaces ---------------------------- Namespaces are declared with ``xmlns`` attributes: .. code-block:: xml Content Content Namespace Handling in read_xml ------------------------------ The ``namespaces`` parameter controls how namespaces are processed: Strip (Default) ~~~~~~~~~~~~~~~ Removes namespace prefixes from element names: .. code-block:: sql SELECT * FROM read_xml('data.xml', namespaces := 'strip'); .. code-block:: xml Value Keep ~~~~ Preserves namespace prefixes in element names: .. code-block:: sql SELECT * FROM read_xml('data.xml', namespaces := 'keep'); .. code-block:: xml Value Expand ~~~~~~ Replaces prefixes with full namespace URIs: .. code-block:: sql SELECT * FROM read_xml('data.xml', namespaces := 'expand'); .. code-block:: xml Value Namespace Handling in xml_to_json --------------------------------- The ``xml_to_json`` function also supports namespace options: .. code-block:: sql -- Strip namespaces (default) SELECT xml_to_json('Test'); -- Result: {"root":{"item":{"#text":"Test"}}} -- Keep namespace prefixes SELECT xml_to_json( 'Test', namespaces := 'keep' ); -- Result: {"ns:root":{"ns:item":{"#text":"Test"}}} -- Include xmlns declarations SELECT xml_to_json( 'Test', namespaces := 'keep', xmlns_key := '#xmlns' ); -- Result: {"ns:root":{"#xmlns":{"ns":"http://example.com"},"ns:item":{"#text":"Test"}}} XPath and Namespaces -------------------- When using XPath extraction functions on namespaced documents, simple paths may not match: .. code-block:: sql -- This may return empty for namespaced XML: SELECT xml_extract_text(xml, '//item'); **Solution 1: Use namespaces := 'auto' (Recommended)** The ``namespaces := 'auto'`` parameter automatically handles both declared and undeclared namespace prefixes: .. code-block:: sql -- Works with declared namespaces SELECT xml_extract_text( 'Value', '//ns:item', namespaces := 'auto' ); -- Result: [Value] -- Also works with undeclared prefixes (common in GML, etc.) SELECT xml_extract_text( '1 2 3', '//gml:pos', namespaces := 'auto' ); -- Result: [1 2 3] **Solution 2: Use local-name() (Fallback)** For maximum compatibility, use ``local-name()`` to match elements regardless of namespace: .. code-block:: sql -- Match elements regardless of namespace SELECT xml_extract_text(xml, '//*[local-name()="item"]'); -- With predicates SELECT xml_extract_text(xml, '//*[local-name()="item" and @id="123"]'); -- Nested elements SELECT xml_extract_text(xml, '//*[local-name()="catalog"]/*[local-name()="product"]/*[local-name()="name"]' ); Discovering Namespaces ---------------------- Use ``xml_namespaces`` to see what namespaces are declared: .. code-block:: sql SELECT xml_namespaces(''); -- Result: [ -- {prefix: "", uri: "http://default.com"}, -- {prefix: "ns", uri: "http://example.com"} -- ] Common Patterns --------------- AWS API Responses ~~~~~~~~~~~~~~~~~ AWS XML responses use default namespaces: .. code-block:: xml abc123 ... .. code-block:: sql -- Works because read_xml strips namespaces by default SELECT * FROM read_xml('response.xml', record_element := 'item'); -- XPath extraction requires local-name() SELECT xml_extract_text(xml, '//*[local-name()="requestId"]') FROM read_xml_objects('response.xml'); SOAP Messages ~~~~~~~~~~~~~ SOAP uses multiple namespaces: .. code-block:: sql -- Strip all namespaces SELECT * FROM read_xml('soap.xml', namespaces := 'strip'); -- Keep for debugging SELECT * FROM read_xml('soap.xml', namespaces := 'keep'); Atom/RSS Feeds ~~~~~~~~~~~~~~ Feeds often mix namespaces: .. code-block:: sql SELECT * FROM read_xml('feed.atom', record_element := 'entry'); Best Practices -------------- 1. **Use default (strip)** for ``read_xml`` - simplifies column names 2. **Use keep** when namespace prefixes are semantically important 3. **Use namespaces := 'auto'** for XPath functions - handles both declared and undeclared prefixes 4. **Use local-name()** as a fallback when ``'auto'`` doesn't work 5. **Check xml_namespaces()** when debugging namespace issues