XML Extraction Functions ======================== These functions extract data from XML documents using XPath expressions. xml_extract_text ---------------- Extract text content from all matching elements. **Syntax:** .. code-block:: sql xml_extract_text(xml, xpath) **Parameters:** - ``xml`` (VARCHAR or XML): The XML content to search - ``xpath`` (VARCHAR): XPath expression to match **Returns:** VARCHAR[] - List of text content from all matching elements, or empty list if no match. **Examples:** .. code-block:: sql -- Basic extraction (returns list) SELECT xml_extract_text('DuckDB Guide', '//title'); -- Result: ["DuckDB Guide"] -- Get first match using array indexing SELECT xml_extract_text('DuckDB Guide', '//title')[1]; -- Result: "DuckDB Guide" -- Multiple matches SELECT xml_extract_text('Book ABook B', '//title'); -- Result: ["Book A", "Book B"] -- Attribute extraction SELECT xml_extract_text('', '/item/@id'); -- Result: ["123"] -- With predicates SELECT xml_extract_text( 'Novel', '//book[@category="fiction"]/title' ); -- Result: ["Novel"] .. note:: For namespaced documents, use ``local-name()`` to match elements regardless of namespace: .. code-block:: sql SELECT xml_extract_text(xml, '//*[local-name()="title"]') **With Namespace Support:** Use the ``namespaces`` named parameter to handle namespace prefixes in XPath: .. code-block:: sql xml_extract_text(xml, xpath, namespaces:=) **Namespace Modes:** .. list-table:: :header-rows: 1 :widths: 15 85 * - Mode - Description * - ``'auto'`` - Auto-register declared namespaces, common namespaces, and mock undefined prefixes. **Recommended for most use cases.** * - ``'strict'`` - Auto-register declared namespaces, raise error on undefined prefix. * - ``'ignore'`` - Auto-register declared namespaces, silently return empty on undefined prefix. **Examples:** .. code-block:: sql -- Use 'auto' mode for convenience (handles undeclared prefixes) SELECT xml_extract_text( '1 2 3', '//gml:pos', namespaces:='auto' ); -- Result: ["1 2 3"] -- Use explicit MAP for precise control SELECT xml_extract_text( 'Value', '//ns:item', namespaces:=MAP {'ns': 'http://example.com'} ); -- Result: ["Value"] -- Use helper functions SELECT xml_extract_text( '1 2 3', '//gml:pos', namespaces:=xml_mock_namespaces(['gml']) ); -- Result: ["1 2 3"] -- Discover namespaces with xml_namespaces() SELECT xml_namespaces(''); -- Result: {gml: "http://www.opengis.net/gml"} xml_extract_all_text -------------------- Extract all text content from an XML document. **Syntax:** .. code-block:: sql xml_extract_all_text(xml) **Parameters:** - ``xml`` (VARCHAR or XML): The XML content **Returns:** VARCHAR - All text content concatenated. **Example:** .. code-block:: sql SELECT xml_extract_all_text('

Hello world!

'); -- Result: "Hello world!" xml_extract_elements -------------------- Extract all matching elements as XML fragments. **Syntax:** .. code-block:: sql xml_extract_elements(xml, xpath) **Returns:** xmlfragment[] - List of matching elements as XML fragments. **Example:** .. code-block:: sql SELECT xml_extract_elements('FirstSecond', '//item'); -- Result: [First, Second] -- Get first match SELECT xml_extract_elements('First', '//item')[1]; xml_extract_elements_string --------------------------- Extract all matching elements as newline-separated text. **Syntax:** .. code-block:: sql xml_extract_elements_string(xml, xpath) **Returns:** VARCHAR - All matching elements serialized as XML, separated by newlines. **Example:** .. code-block:: sql SELECT xml_extract_elements_string( 'AB', '//item' ); -- Result: "A\nB" xml_extract_attributes ---------------------- Extract attributes from matching elements as a list of structs. **Syntax:** .. code-block:: sql xml_extract_attributes(xml, xpath) **Returns:** LIST - List of attribute key-value pairs for each matching element. **Example:** .. code-block:: sql SELECT xml_extract_attributes( '', '//item' ); -- Result: [{id: "1", type: "A"}, {id: "2", type: "B"}] xml_extract_comments -------------------- Extract all XML comments with their line numbers. **Syntax:** .. code-block:: sql xml_extract_comments(xml) **Returns:** LIST **Example:** .. code-block:: sql SELECT xml_extract_comments(''); -- Result: [{content: " This is a comment ", line_number: 1}] xml_extract_cdata ----------------- Extract all CDATA sections with their line numbers. **Syntax:** .. code-block:: sql xml_extract_cdata(xml) **Returns:** LIST **Example:** .. code-block:: sql SELECT xml_extract_cdata(''); -- Result: [{content: "Some raw content", line_number: 1}]