Package org.apache.any23.extractor
Interface Extractor<Input>
-
- Type Parameters:
Input
- the type of the input data to be processed.
- All Known Subinterfaces:
Extractor.BlindExtractor
,Extractor.ContentExtractor
,Extractor.TagSoupDOMExtractor
- All Known Implementing Classes:
AdrExtractor
,BaseRDFExtractor
,CSVExtractor
,EmbeddedJSONLDExtractor
,EntityBasedMicroformatExtractor
,FunctionalSyntaxExtractor
,GeoExtractor
,HAdrExtractor
,HCalendarExtractor
,HCardExtractor
,HCardExtractor
,HeadLinkExtractor
,HEntryExtractor
,HEventExtractor
,HGeoExtractor
,HItemExtractor
,HListingExtractor
,HProductExtractor
,HRecipeExtractor
,HRecipeExtractor
,HResumeExtractor
,HResumeExtractor
,HReviewAggregateExtractor
,HReviewExtractor
,HTMLMetaExtractor
,ICalExtractor
,ICBMExtractor
,JCalExtractor
,JSONLDExtractor
,LicenseExtractor
,ManchesterSyntaxExtractor
,MicrodataExtractor
,MicroformatExtractor
,NQuadsExtractor
,NTriplesExtractor
,RDFa11Extractor
,RDFaExtractor
,RDFXMLExtractor
,SpeciesExtractor
,TitleExtractor
,TriXExtractor
,TurtleExtractor
,TurtleHTMLExtractor
,XCalExtractor
,XFNExtractor
,XPathExtractor
,YAMLExtractor
public interface Extractor<Input>
It defines the signature of a generic Extractor.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static interface
Extractor.BlindExtractor
static interface
Extractor.ContentExtractor
This interface specializes anExtractor
able to handleInputStream
as input format.static interface
Extractor.TagSoupDOMExtractor
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ExtractorDescription
getDescription()
Returns aExtractorDescription
of this extractor.void
run(ExtractionParameters extractionParameters, ExtractionContext context, Input in, ExtractionResult out)
Executes the extractor.
-
-
-
Method Detail
-
run
void run(ExtractionParameters extractionParameters, ExtractionContext context, Input in, ExtractionResult out) throws IOException, ExtractionException
Executes the extractor. Will be invoked only once, extractors are not reusable.- Parameters:
extractionParameters
- the parameters to be applied during the extraction.context
- The document context.in
- The extractor input data.out
- the collector for the extracted data.- Throws:
IOException
- On error while reading from the input stream.ExtractionException
- On other error, such as parse errors.
-
getDescription
ExtractorDescription getDescription()
Returns aExtractorDescription
of this extractor.- Returns:
- the object representing the extractor description.
-
-