java.lang.Object
- org.apache.any23.extractor.html.DomUtils

```
public class DomUtils
extends Object
```
This class provides utility methods for DOM manipulation. It is separated from HTMLDocument so that its methods can be run on single DOM nodes without having to wrap them into an HTMLDocument.
We use a mix of XPath and DOM manipulation.
This is likely to be a performance bottleneck but at least everything is localized here.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static InputStream`	`documentToInputStream(Document doc)`	Given a `Document` this method will return an input stream representing that document.
`static String`	`find(Node node, String xpath)`	Gets the string value of an XPath expression.
`static List<Node>`	`findAll(Node node, String xpath)`	Returns a NodeList composed of all the nodes that match an XPath expression, which must be valid.
`static List<Node>`	`findAllByAttributeContains(Node node, String attrName, String attrContains)`
`static List<Node>`	`findAllByAttributeName(Node root, String attrName)`	Finds all nodes that have a declared attribute.
`static List<Node>`	`findAllByClassName(Node root, String className)`	Finds all nodes that have a declared class.
`static List<Node>`	`findAllByTag(Node root, String tagName)`
`static List<Node>`	`findAllByTagAndClassName(Node root, String tagName, String className)`
`static Node`	`findNodeById(Node root, String id)`	Mimics the JS DOM API, or prototype's $()
`static int`	`getIndexInParent(Node n)`	Given a node this method returns the index corresponding to such node within the list of the children of its parent node.
`static int[]`	`getNodeLocation(Node n)`	Returns the row/col location of the given node.
`static String`	`getXPathForNode(Node node)`	Does a reverse walking of the DOM tree to generate a unique XPath expression leading to this node.
`static String[]`	`getXPathListForNode(Node n)`	Returns a list of tag names representing the path from the document root to the given node n.
`static boolean`	`hasAttribute(Node node, String attributeName)`	Checks the presence of an attribute in the given `node`.
`static boolean`	`hasAttribute(Node node, String attributeName, String className)`	Checks the presence of an attribute value in attributes that contain whitespace-separated lists of values.
`static boolean`	`hasClassName(Node node, String className)`	Tells if an element has a class name not checking the parents in the hierarchy mimicking the CSS .foo match.
`static boolean`	`isAncestorOf(Node candidateAncestor, Node candidateSibling)`	Checks whether a node is ancestor or same of another node.
`static boolean`	`isAncestorOf(Node candidateAncestor, Node candidateSibling, boolean strict)`	Checks whether a node is ancestor or same of another node.
`static boolean`	`isElementNode(Node target)`	Verifies if the given target node is an element.
`static InputStream`	`nodeToInputStream(Node node)`	Convert a w3c dom node to a InputStream
`static String`	`readAttribute(Node node, String attribute)`	Reads the value of an `attribute`, returning the empty string if not present.
`static String`	`readAttribute(Node node, String attribute, String defaultValue)`	Reads the value of the specified `attribute`, returning the `defaultValue` string if not present.
`static String`	`readAttributeWithPrefix(Node node, String attributePrefix, String defaultValue)`	Reads the value of the first attribute which name matches with the specified `attributePrefix`.
`static String`	`serializeToXML(Node node, boolean indent)`	Given a DOM `Node` produces the XML serialization omitting the XML declaration.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - getIndexInParent
```
public static int getIndexInParent(Node n)
```
    Given a node this method returns the index corresponding to such node within the list of the children of its parent node.
    
    Parameters:
    
    n - the node of which returning the index.
    
    Returns:
    
    a non negative number.
  - getXPathForNode
```
public static String getXPathForNode(Node node)
```
    Does a reverse walking of the DOM tree to generate a unique XPath expression leading to this node. The XPath generated is the canonical one based on sibling index: /html[1]/body[1]/div[2]/span[3] etc..
    
    Parameters:
    
    node - the input node.
    
    Returns:
    
    the XPath location of node as String.
  - getXPathListForNode
```
public static String[] getXPathListForNode(Node n)
```
    Returns a list of tag names representing the path from the document root to the given node n.
    
    Parameters:
    
    n - the node for which retrieve the path.
    
    Returns:
    
    a sequence of HTML tag names.
  - getNodeLocation
```
public static int[] getNodeLocation(Node n)
```
    Returns the row/col location of the given node.
    
    Parameters:
    
    n - input node.
    
    Returns:
    
    an array of two elements of type [<begin-row>, <begin-col>, <end-row> <end-col>] or null if not possible to extract such data.
  - isAncestorOf
```
public static boolean isAncestorOf(Node candidateAncestor,
                                   Node candidateSibling,
                                   boolean strict)
```
    Checks whether a node is ancestor or same of another node.
    
    Parameters:
    
    candidateAncestor - the candidate ancestor node.
    
    candidateSibling - the candidate sibling node.
    
    strict - if true is not allowed that the ancestor and sibling can be the same node.
    
    Returns:
    
    true if candidateSibling is ancestor of candidateSibling, false otherwise.
  - isAncestorOf
```
public static boolean isAncestorOf(Node candidateAncestor,
                                   Node candidateSibling)
```
    Checks whether a node is ancestor or same of another node. As isAncestorOf(org.w3c.dom.Node, org.w3c.dom.Node, boolean) with strict=false.
    
    Parameters:
    
    candidateAncestor - the candidate ancestor node.
    
    candidateSibling - the candidate sibling node.
    
    Returns:
    
    true if candidateSibling is ancestor of candidateSibling, false otherwise.
  - findAllByClassName
```
public static List<Node> findAllByClassName(Node root,
                                            String className)
```
    Finds all nodes that have a declared class. Note that the className is transformed to lower case before being matched against the DOM.
    
    Parameters:
    
    root - the root node from which start searching.
    
    className - the name of the filtered class.
    
    Returns:
    
    list of matching nodes or an empty list.
  - findAllByAttributeName
```
public static List<Node> findAllByAttributeName(Node root,
                                                String attrName)
```
    Finds all nodes that have a declared attribute. Note that the className is transformed to lower case before being matched against the DOM.
    
    Parameters:
    
    root - the root node from which start searching.
    
    attrName - the name of the filtered attribue.
    
    Returns:
    
    list of matching nodes or an empty list.
  - findAllByAttributeContains
```
public static List<Node> findAllByAttributeContains(Node node,
                                                    String attrName,
                                                    String attrContains)
```
  - findAllByTag
```
public static List<Node> findAllByTag(Node root,
                                      String tagName)
```
  - findAllByTagAndClassName
```
public static List<Node> findAllByTagAndClassName(Node root,
                                                  String tagName,
                                                  String className)
```
  - findNodeById
```
public static Node findNodeById(Node root,
                                String id)
```
    Mimics the JS DOM API, or prototype's $()
    
    Parameters:
    
    root - the node to locate
    
    id - the id of the node to locate
    
    Returns:
    
    the Node if one exists
  - findAll
```
public static List<Node> findAll(Node node,
                                 String xpath)
```
    Returns a NodeList composed of all the nodes that match an XPath expression, which must be valid.
    
    Parameters:
    
    node - the node object to locate
    
    xpath - an xpath expression
    
    Returns:
    
    a list of Node's if they exists
  - find
```
public static String find(Node node,
                          String xpath)
```
    Gets the string value of an XPath expression.
    
    Parameters:
    
    node - the node object to locate
    
    xpath - an xpath expression
    
    Returns:
    
    a string xpath value
  - hasClassName
```
public static boolean hasClassName(Node node,
                                   String className)
```
    Tells if an element has a class name not checking the parents in the hierarchy mimicking the CSS .foo match.
    
    Parameters:
    
    node - the node object to locate
    
    className - the CSS class name
    
    Returns:
    
    true if the class name exists
  - hasAttribute
```
public static boolean hasAttribute(Node node,
                                   String attributeName,
                                   String className)
```
    Checks the presence of an attribute value in attributes that contain whitespace-separated lists of values. The semantic is the CSS classes' ones: "foo" matches "bar foo", "foo" but not "foob"
    
    Parameters:
    
    node - the node object to locate
    
    attributeName - attribute value
    
    className - the CSS class name
    
    Returns:
    
    true if the class has the attribute name
  - hasAttribute
```
public static boolean hasAttribute(Node node,
                                   String attributeName)
```
    Checks the presence of an attribute in the given node.
    
    Parameters:
    
    node - the node container.
    
    attributeName - the name of the attribute.
    
    Returns:
    
    true if the attribute is present
  - isElementNode
```
public static boolean isElementNode(Node target)
```
    Verifies if the given target node is an element.
    
    Parameters:
    
    target - target node to check
    
    Returns:
    
    true if the element the node is an element, false otherwise.
  - readAttribute
```
public static String readAttribute(Node node,
                                   String attribute,
                                   String defaultValue)
```
    Reads the value of the specified attribute, returning the defaultValue string if not present.
    
    Parameters:
    
    node - node to read the attribute.
    
    attribute - attribute name.
    
    defaultValue - the default value to return if attribute is not found.
    
    Returns:
    
    the attribute value or defaultValue if not found.
  - readAttributeWithPrefix
```
public static String readAttributeWithPrefix(Node node,
                                             String attributePrefix,
                                             String defaultValue)
```
    Reads the value of the first attribute which name matches with the specified attributePrefix. Returns the defaultValue if not found.
    
    Parameters:
    
    node - node to look for attributes.
    
    attributePrefix - attribute prefix.
    
    defaultValue - default returned value.
    
    Returns:
    
    the value found or default.
  - readAttribute
```
public static String readAttribute(Node node,
                                   String attribute)
```
    Reads the value of an attribute, returning the empty string if not present.
    
    Parameters:
    
    node - node to read the attribute.
    
    attribute - attribute name.
    
    Returns:
    
    the attribute value or "" if not found.
  - serializeToXML
```
public static String serializeToXML(Node node,
                                    boolean indent)
                             throws TransformerException,
                                    IOException
```
    Given a DOM Node produces the XML serialization omitting the XML declaration.
    
    Parameters:
    
    node - node to be serialized.
    
    indent - if true the output is indented.
    
    Returns:
    
    the XML serialization.
    
    Throws:
    
    TransformerException - if an error occurs during the serializator initialization and activation.
    
    IOException - if there is an error locating the node
  - documentToInputStream
```
public static InputStream documentToInputStream(Document doc)
```
    Given a Document this method will return an input stream representing that document.
    
    Parameters:
    
    doc - the input Document
    
    Returns:
    
    an InputStream
  - nodeToInputStream
```
public static InputStream nodeToInputStream(Node node)
```
    Convert a w3c dom node to a InputStream
    
    Parameters:
    
    node - Node to convert
    
    Returns:
    
    the converted InputStream

Class DomUtils

Method Summary

Methods inherited from class java.lang.Object

Method Detail

getIndexInParent

getXPathForNode

getXPathListForNode

getNodeLocation

isAncestorOf

isAncestorOf

findAllByClassName

findAllByAttributeName

findAllByAttributeContains

findAllByTag

findAllByTagAndClassName

findNodeById

findAll

find

hasClassName

hasAttribute

hasAttribute

isElementNode

readAttribute

readAttributeWithPrefix

readAttribute

serializeToXML

documentToInputStream

nodeToInputStream