Package org.apache.any23.extractor.xpath
Class XPathExtractor
- java.lang.Object
-
- org.apache.any23.extractor.xpath.XPathExtractor
-
- All Implemented Interfaces:
Extractor<Document>,Extractor.TagSoupDOMExtractor
public class XPathExtractor extends Object implements Extractor.TagSoupDOMExtractor
Implementation of anExtractor.TagSoupDOMExtractorable to applyXPathExtractionRules and generate quads.- Author:
- Michele Mostarda (mostarda@fbk.eu)
- See Also:
XPathExtractionRule
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
-
-
Constructor Summary
Constructors Constructor Description XPathExtractor()XPathExtractor(List<XPathExtractionRule> rules)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(XPathExtractionRule rule)booleancontains(XPathExtractionRule rule)ExtractorDescriptiongetDescription()Returns aExtractorDescriptionof this extractor.voidremove(XPathExtractionRule rule)voidrun(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out)Executes the extractor.
-
-
-
Constructor Detail
-
XPathExtractor
public XPathExtractor()
-
XPathExtractor
public XPathExtractor(List<XPathExtractionRule> rules)
-
-
Method Detail
-
add
public void add(XPathExtractionRule rule)
-
remove
public void remove(XPathExtractionRule rule)
-
contains
public boolean contains(XPathExtractionRule rule)
-
run
public void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out) throws IOException, ExtractionException
Description copied from interface:ExtractorExecutes the extractor. Will be invoked only once, extractors are not reusable.- Specified by:
runin interfaceExtractor<Document>- Parameters:
extractionParameters- the parameters to be applied during the extraction.extractionContext- The document context.in- The extractor input data.out- the collector for the extracted data.- Throws:
IOException- On error while reading from the input stream.ExtractionException- On other error, such as parse errors.
-
getDescription
public ExtractorDescription getDescription()
Description copied from interface:ExtractorReturns aExtractorDescriptionof this extractor.- Specified by:
getDescriptionin interfaceExtractor<Document>- Returns:
- the object representing the extractor description.
-
-