Package org.apache.any23.extractor.html
Class HListingExtractor
- java.lang.Object
-
- org.apache.any23.extractor.html.MicroformatExtractor
-
- org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
-
- org.apache.any23.extractor.html.HListingExtractor
-
- All Implemented Interfaces:
Extractor<Document>
,Extractor.TagSoupDOMExtractor
public class HListingExtractor extends EntityBasedMicroformatExtractor
Extractor for the hListing microformat.- Author:
- Gabriele Renzi
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
-
-
Field Summary
-
Fields inherited from class org.apache.any23.extractor.html.MicroformatExtractor
BEGIN_SCRIPT, END_SCRIPT, valueFactory
-
-
Constructor Summary
Constructors Constructor Description HListingExtractor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
extractEntity(Node node, ExtractionResult out)
Extracts an entity from a DOM node.protected String
getBaseClassName()
Returns the base class name for the extractor.ExtractorDescription
getDescription()
Returns the description of this extractor.protected void
resetExtractor()
Resets the internal status of the extractor to prepare it to a new extraction section.-
Methods inherited from class org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
extract, getBlankNodeFor
-
Methods inherited from class org.apache.any23.extractor.html.MicroformatExtractor
addBNodeProperty, addBNodeProperty, addIRIProperty, conditionallyAddLiteralProperty, conditionallyAddResourceProperty, conditionallyAddStringProperty, fixLink, fixLink, getCurrentExtractionResult, getDocumentIRI, getExtractionContext, getHTMLDocument, includes, openSubResult, run, setCurrentExtractionResult
-
-
-
-
Method Detail
-
getDescription
public ExtractorDescription getDescription()
Description copied from class:MicroformatExtractor
Returns the description of this extractor.- Specified by:
getDescription
in interfaceExtractor<Document>
- Specified by:
getDescription
in classMicroformatExtractor
- Returns:
- a human readable description.
-
getBaseClassName
protected String getBaseClassName()
Description copied from class:EntityBasedMicroformatExtractor
Returns the base class name for the extractor.- Specified by:
getBaseClassName
in classEntityBasedMicroformatExtractor
- Returns:
- a string containing the base of the extractor.
-
resetExtractor
protected void resetExtractor()
Description copied from class:EntityBasedMicroformatExtractor
Resets the internal status of the extractor to prepare it to a new extraction section.- Specified by:
resetExtractor
in classEntityBasedMicroformatExtractor
-
extractEntity
protected boolean extractEntity(Node node, ExtractionResult out) throws ExtractionException
Description copied from class:EntityBasedMicroformatExtractor
Extracts an entity from a DOM node.- Specified by:
extractEntity
in classEntityBasedMicroformatExtractor
- Parameters:
node
- the DOM node.out
- the extraction result collector.- Returns:
true
if the extraction has produces something,false
otherwise.- Throws:
ExtractionException
- if there is an error during extraction
-
-