Class MicrodataParser
- java.lang.Object
-
- org.apache.any23.extractor.microdata.MicrodataParser
-
public class MicrodataParser extends Object
This class provides utility methods for handling Microdata nodes contained within a DOM document.- Author:
- Michele Mostarda (mostarda@fbk.eu), Hans Brende (hansbrende@apache.org)
-
-
Field Summary
Fields Modifier and Type Field Description static Set<String>
HREF_TAGS
List of tags providing thehref
property.static String
ITEMPROP_ATTRIBUTE
static String
ITEMSCOPE_ATTRIBUTE
static Set<String>
SRC_TAGS
List of tags providing thesrc
property.
-
Constructor Summary
Constructors Constructor Description MicrodataParser(Document document)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ItemProp[]
deferProperties(String... refs)
Given a document and a list of itemprop names this method will return such itemprops.org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode
getErrorMode()
MicrodataParserException[]
getErrors()
static List<Node>
getItemPropNodes(Node node)
Returns all the itemProps detected within the given root node.List<ItemProp>
getItemProps(Node scopeNode, boolean skipRoot)
Returns all the itemprops for the given itemscope node.ItemScope
getItemScope(Node node)
Returns theItemScope
instance described within the specifiednode
.static List<Node>
getItemScopeNodes(Node node)
Returns all the itemScopes detected within the given root node.static MicrodataParserReport
getMicrodata(Document document)
Returns all the Microdata items detected within the givendocument
, works in full report mode.static MicrodataParserReport
getMicrodata(Document document, org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
Returns all the Microdata items detected within the givendocument
.static void
getMicrodataAsJSON(Document document, PrintStream ps)
Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.ItemPropValue
getPropertyValue(Node node)
Reads the value of a itemprop node.static List<Node>
getTopLevelItemScopeNodes(Node node)
Returns only the itemScopes that are top level items.static boolean
isItemProp(Node node)
Check whether a node is an itemProp.static boolean
isItemScope(Node node)
Check whether a node is an itemScope.void
setErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
-
-
-
Field Detail
-
ITEMSCOPE_ATTRIBUTE
public static final String ITEMSCOPE_ATTRIBUTE
- See Also:
- Constant Field Values
-
ITEMPROP_ATTRIBUTE
public static final String ITEMPROP_ATTRIBUTE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MicrodataParser
public MicrodataParser(Document document)
-
-
Method Detail
-
getItemScopeNodes
public static List<Node> getItemScopeNodes(Node node)
Returns all the itemScopes detected within the given root node.- Parameters:
node
- root node to search in.- Returns:
- list of detected items.
-
isItemScope
public static boolean isItemScope(Node node)
Check whether a node is an itemScope.- Parameters:
node
- node to check.- Returns:
true
if the node is an itemScope.,false
otherwise.
-
getItemPropNodes
public static List<Node> getItemPropNodes(Node node)
Returns all the itemProps detected within the given root node.- Parameters:
node
- root node to search in.- Returns:
- list of detected items.
-
isItemProp
public static boolean isItemProp(Node node)
Check whether a node is an itemProp.- Parameters:
node
- node to check.- Returns:
true
if the node is an itemProp.,false
otherwise.
-
getTopLevelItemScopeNodes
public static List<Node> getTopLevelItemScopeNodes(Node node)
Returns only the itemScopes that are top level items.- Parameters:
node
- root node to search in.- Returns:
- list of detected top item scopes.
-
getMicrodata
public static MicrodataParserReport getMicrodata(Document document, org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode) throws MicrodataParserException
Returns all the Microdata items detected within the givendocument
.- Parameters:
document
- document to be processed.errorMode
- error management policy.- Returns:
- list of itemscope items.
- Throws:
MicrodataParserException
- iferrorMode ==
and an error occurs.MicrodataParser.ErrorMode.STOP_AT_FIRST_ERROR
-
getMicrodata
public static MicrodataParserReport getMicrodata(Document document)
Returns all the Microdata items detected within the givendocument
, works in full report mode.- Parameters:
document
- document to be processed.- Returns:
- list of itemscope items.
-
getMicrodataAsJSON
public static void getMicrodataAsJSON(Document document, PrintStream ps)
Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.- Parameters:
document
- document to be processed.ps
- thePrintStream
to write JSON to
-
setErrorMode
public void setErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
-
getErrorMode
public org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode getErrorMode()
-
getErrors
public MicrodataParserException[] getErrors()
-
getPropertyValue
public ItemPropValue getPropertyValue(Node node) throws MicrodataParserException
Reads the value of a itemprop node.- Parameters:
node
- itemprop node.- Returns:
- value detected within the given
node
. - Throws:
MicrodataParserException
- if an error occurs while extracting a nested item scope.
-
getItemProps
public List<ItemProp> getItemProps(Node scopeNode, boolean skipRoot) throws MicrodataParserException
Returns all the itemprops for the given itemscope node.- Parameters:
scopeNode
- node representing the itemscopeskipRoot
- iftrue
the given rootnode
will be not read as a property, even if it contains the itemprop attribute.- Returns:
- the list of itemprops detected within the given itemscope.
- Throws:
MicrodataParserException
- if an error occurs while retrieving an property value.
-
deferProperties
public ItemProp[] deferProperties(String... refs) throws MicrodataParserException
Given a document and a list of itemprop names this method will return such itemprops.- Parameters:
refs
- list of references.- Returns:
- list of retrieved itemprops.
- Throws:
MicrodataParserException
- if a loop is detected or a property name is missing.
-
getItemScope
public ItemScope getItemScope(Node node) throws MicrodataParserException
Returns theItemScope
instance described within the specifiednode
.- Parameters:
node
- node describing an itemscope.- Returns:
- instance of ItemScope object.
- Throws:
MicrodataParserException
- if an error occurs while dereferencing properties.
-
-