Class MicrodataParser
- java.lang.Object
 - 
- org.apache.any23.extractor.microdata.MicrodataParser
 
 
- 
public class MicrodataParser extends Object
This class provides utility methods for handling Microdata nodes contained within a DOM document.- Author:
 - Michele Mostarda (mostarda@fbk.eu), Hans Brende (hansbrende@apache.org)
 
 
- 
- 
Field Summary
Fields Modifier and Type Field Description static Set<String>HREF_TAGSList of tags providing thehrefproperty.static StringITEMPROP_ATTRIBUTEstatic StringITEMSCOPE_ATTRIBUTEstatic Set<String>SRC_TAGSList of tags providing thesrcproperty. 
- 
Constructor Summary
Constructors Constructor Description MicrodataParser(Document document) 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ItemProp[]deferProperties(String... refs)Given a document and a list of itemprop names this method will return such itemprops.org.apache.any23.extractor.microdata.MicrodataParser.ErrorModegetErrorMode()MicrodataParserException[]getErrors()static List<Node>getItemPropNodes(Node node)Returns all the itemProps detected within the given root node.List<ItemProp>getItemProps(Node scopeNode, boolean skipRoot)Returns all the itemprops for the given itemscope node.ItemScopegetItemScope(Node node)Returns theItemScopeinstance described within the specifiednode.static List<Node>getItemScopeNodes(Node node)Returns all the itemScopes detected within the given root node.static MicrodataParserReportgetMicrodata(Document document)Returns all the Microdata items detected within the givendocument, works in full report mode.static MicrodataParserReportgetMicrodata(Document document, org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)Returns all the Microdata items detected within the givendocument.static voidgetMicrodataAsJSON(Document document, PrintStream ps)Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.ItemPropValuegetPropertyValue(Node node)Reads the value of a itemprop node.static List<Node>getTopLevelItemScopeNodes(Node node)Returns only the itemScopes that are top level items.static booleanisItemProp(Node node)Check whether a node is an itemProp.static booleanisItemScope(Node node)Check whether a node is an itemScope.voidsetErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode) 
 - 
 
- 
- 
Field Detail
- 
ITEMSCOPE_ATTRIBUTE
public static final String ITEMSCOPE_ATTRIBUTE
- See Also:
 - Constant Field Values
 
 
- 
ITEMPROP_ATTRIBUTE
public static final String ITEMPROP_ATTRIBUTE
- See Also:
 - Constant Field Values
 
 
 - 
 
- 
Constructor Detail
- 
MicrodataParser
public MicrodataParser(Document document)
 
 - 
 
- 
Method Detail
- 
getItemScopeNodes
public static List<Node> getItemScopeNodes(Node node)
Returns all the itemScopes detected within the given root node.- Parameters:
 node- root node to search in.- Returns:
 - list of detected items.
 
 
- 
isItemScope
public static boolean isItemScope(Node node)
Check whether a node is an itemScope.- Parameters:
 node- node to check.- Returns:
 trueif the node is an itemScope.,falseotherwise.
 
- 
getItemPropNodes
public static List<Node> getItemPropNodes(Node node)
Returns all the itemProps detected within the given root node.- Parameters:
 node- root node to search in.- Returns:
 - list of detected items.
 
 
- 
isItemProp
public static boolean isItemProp(Node node)
Check whether a node is an itemProp.- Parameters:
 node- node to check.- Returns:
 trueif the node is an itemProp.,falseotherwise.
 
- 
getTopLevelItemScopeNodes
public static List<Node> getTopLevelItemScopeNodes(Node node)
Returns only the itemScopes that are top level items.- Parameters:
 node- root node to search in.- Returns:
 - list of detected top item scopes.
 
 
- 
getMicrodata
public static MicrodataParserReport getMicrodata(Document document, org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode) throws MicrodataParserException
Returns all the Microdata items detected within the givendocument.- Parameters:
 document- document to be processed.errorMode- error management policy.- Returns:
 - list of itemscope items.
 - Throws:
 MicrodataParserException- iferrorMode ==and an error occurs.MicrodataParser.ErrorMode.STOP_AT_FIRST_ERROR
 
- 
getMicrodata
public static MicrodataParserReport getMicrodata(Document document)
Returns all the Microdata items detected within the givendocument, works in full report mode.- Parameters:
 document- document to be processed.- Returns:
 - list of itemscope items.
 
 
- 
getMicrodataAsJSON
public static void getMicrodataAsJSON(Document document, PrintStream ps)
Returns a JSON containing the list of all extracted Microdata, as described at Microdata JSON Specification.- Parameters:
 document- document to be processed.ps- thePrintStreamto write JSON to
 
- 
setErrorMode
public void setErrorMode(org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode errorMode)
 
- 
getErrorMode
public org.apache.any23.extractor.microdata.MicrodataParser.ErrorMode getErrorMode()
 
- 
getErrors
public MicrodataParserException[] getErrors()
 
- 
getPropertyValue
public ItemPropValue getPropertyValue(Node node) throws MicrodataParserException
Reads the value of a itemprop node.- Parameters:
 node- itemprop node.- Returns:
 - value detected within the given 
node. - Throws:
 MicrodataParserException- if an error occurs while extracting a nested item scope.
 
- 
getItemProps
public List<ItemProp> getItemProps(Node scopeNode, boolean skipRoot) throws MicrodataParserException
Returns all the itemprops for the given itemscope node.- Parameters:
 scopeNode- node representing the itemscopeskipRoot- iftruethe given rootnodewill be not read as a property, even if it contains the itemprop attribute.- Returns:
 - the list of itemprops detected within the given itemscope.
 - Throws:
 MicrodataParserException- if an error occurs while retrieving an property value.
 
- 
deferProperties
public ItemProp[] deferProperties(String... refs) throws MicrodataParserException
Given a document and a list of itemprop names this method will return such itemprops.- Parameters:
 refs- list of references.- Returns:
 - list of retrieved itemprops.
 - Throws:
 MicrodataParserException- if a loop is detected or a property name is missing.
 
- 
getItemScope
public ItemScope getItemScope(Node node) throws MicrodataParserException
Returns theItemScopeinstance described within the specifiednode.- Parameters:
 node- node describing an itemscope.- Returns:
 - instance of ItemScope object.
 - Throws:
 MicrodataParserException- if an error occurs while dereferencing properties.
 
 - 
 
 -