Package org.apache.any23.extractor
This package contains classes and interfaces modeling the
Extractor
API.-
Interface Summary Interface Description ExtractionResult Interface defining the methods that a representation of an extraction result must have.Extractor<Input> It defines the signature of a generic Extractor.Extractor.BlindExtractor Extractor.ContentExtractor This interface specializes anExtractor
able to handleInputStream
as input format.Extractor.TagSoupDOMExtractor ExtractorDescription It defines a minimal signature for anExtractor
description.ExtractorFactory<T extends Extractor<?>> Interface defining a factory forExtractor
.ExtractorRegistry An interface to the enable a registry for extractors to be implemented by different implementors of this API.IssueReport This interface models an issue reporter.TagSoupExtractionResult This interface models a specificExtractionResult
able to collect property roots generated by HTML Microformat extractions. -
Class Summary Class Description ExampleInputOutput A reporter for example input and output of an extractor.ExtractionContext This class provides the context for the processing of a singleExtractor
.ExtractionParameters This class models the parameters to be used to perform an extraction.ExtractionResultImpl A default implementation ofExtractionResult
; it receives extraction output from oneExtractor
working on one document, and passes the output on to aTripleHandler
.ExtractorGroup It simple models a group ofExtractorFactory
providing simple accessing methods.ExtractorRegistryImpl Singleton class acting as a register for all the variousExtractor
.IssueReport.Issue This class defines a generic issue traced by this extraction result.SimpleExtractorFactory<T extends Extractor<?>> This class is a simple and default-like implementation ofExtractorFactory
.SingleDocumentExtraction This class acts as a facade where all extractors (for a given MIMEType) can be called on a single document.SingleDocumentExtractionReport This class provides the report for aSingleDocumentExtraction
run.TagSoupExtractionResult.PropertyPath Defines a property path object.TagSoupExtractionResult.ResourceRoot Defines a property root object. -
Enum Summary Enum Description ExtractionParameters.ValidationMode Declares the supported validation actions.IssueReport.IssueLevel Possible issue levels. -
Exception Summary Exception Description ExtractionException Defines a specific exception raised during the metadata extraction phase.