Package org.apache.any23.extractor
Interface Extractor<Input>
-
- Type Parameters:
Input- the type of the input data to be processed.
- All Known Subinterfaces:
Extractor.BlindExtractor,Extractor.ContentExtractor,Extractor.TagSoupDOMExtractor
- All Known Implementing Classes:
AdrExtractor,BaseRDFExtractor,CSVExtractor,EmbeddedJSONLDExtractor,EntityBasedMicroformatExtractor,FunctionalSyntaxExtractor,GeoExtractor,HAdrExtractor,HCalendarExtractor,HCardExtractor,HCardExtractor,HeadLinkExtractor,HEntryExtractor,HEventExtractor,HGeoExtractor,HItemExtractor,HListingExtractor,HProductExtractor,HRecipeExtractor,HRecipeExtractor,HResumeExtractor,HResumeExtractor,HReviewAggregateExtractor,HReviewExtractor,HTMLMetaExtractor,ICalExtractor,ICBMExtractor,JCalExtractor,JSONLDExtractor,LicenseExtractor,ManchesterSyntaxExtractor,MicrodataExtractor,MicroformatExtractor,NQuadsExtractor,NTriplesExtractor,RDFa11Extractor,RDFaExtractor,RDFXMLExtractor,SpeciesExtractor,TitleExtractor,TriXExtractor,TurtleExtractor,TurtleHTMLExtractor,XCalExtractor,XFNExtractor,XPathExtractor,YAMLExtractor
public interface Extractor<Input>It defines the signature of a generic Extractor.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static interfaceExtractor.BlindExtractorstatic interfaceExtractor.ContentExtractorThis interface specializes anExtractorable to handleInputStreamas input format.static interfaceExtractor.TagSoupDOMExtractor
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ExtractorDescriptiongetDescription()Returns aExtractorDescriptionof this extractor.voidrun(ExtractionParameters extractionParameters, ExtractionContext context, Input in, ExtractionResult out)Executes the extractor.
-
-
-
Method Detail
-
run
void run(ExtractionParameters extractionParameters, ExtractionContext context, Input in, ExtractionResult out) throws IOException, ExtractionException
Executes the extractor. Will be invoked only once, extractors are not reusable.- Parameters:
extractionParameters- the parameters to be applied during the extraction.context- The document context.in- The extractor input data.out- the collector for the extracted data.- Throws:
IOException- On error while reading from the input stream.ExtractionException- On other error, such as parse errors.
-
getDescription
ExtractorDescription getDescription()
Returns aExtractorDescriptionof this extractor.- Returns:
- the object representing the extractor description.
-
-