Package org.apache.any23.extractor.xpath
Class XPathExtractor
- java.lang.Object
-
- org.apache.any23.extractor.xpath.XPathExtractor
-
- All Implemented Interfaces:
Extractor<Document>
,Extractor.TagSoupDOMExtractor
public class XPathExtractor extends Object implements Extractor.TagSoupDOMExtractor
Implementation of anExtractor.TagSoupDOMExtractor
able to applyXPathExtractionRule
s and generate quads.- Author:
- Michele Mostarda (mostarda@fbk.eu)
- See Also:
XPathExtractionRule
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
-
-
Constructor Summary
Constructors Constructor Description XPathExtractor()
XPathExtractor(List<XPathExtractionRule> rules)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(XPathExtractionRule rule)
boolean
contains(XPathExtractionRule rule)
ExtractorDescription
getDescription()
Returns aExtractorDescription
of this extractor.void
remove(XPathExtractionRule rule)
void
run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out)
Executes the extractor.
-
-
-
Constructor Detail
-
XPathExtractor
public XPathExtractor()
-
XPathExtractor
public XPathExtractor(List<XPathExtractionRule> rules)
-
-
Method Detail
-
add
public void add(XPathExtractionRule rule)
-
remove
public void remove(XPathExtractionRule rule)
-
contains
public boolean contains(XPathExtractionRule rule)
-
run
public void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out) throws IOException, ExtractionException
Description copied from interface:Extractor
Executes the extractor. Will be invoked only once, extractors are not reusable.- Specified by:
run
in interfaceExtractor<Document>
- Parameters:
extractionParameters
- the parameters to be applied during the extraction.extractionContext
- The document context.in
- The extractor input data.out
- the collector for the extracted data.- Throws:
IOException
- On error while reading from the input stream.ExtractionException
- On other error, such as parse errors.
-
getDescription
public ExtractorDescription getDescription()
Description copied from interface:Extractor
Returns aExtractorDescription
of this extractor.- Specified by:
getDescription
in interfaceExtractor<Document>
- Returns:
- the object representing the extractor description.
-
-