Package org.apache.any23.extractor
Class SingleDocumentExtraction
- java.lang.Object
-
- org.apache.any23.extractor.SingleDocumentExtraction
-
public class SingleDocumentExtraction extends Object
This class acts as a facade where all extractors (for a given MIMEType) can be called on a single document. Extractors are automatically filtered by MIMEType.
-
-
Constructor Summary
Constructors Constructor Description SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler.SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)Builds an extractor by the specification of document source, list of extractors and output triple handler.SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StringgetDetectedMIMEType()Returns the detected mimetype for the givenDocumentSource.List<Extractor>getMatchingExtractors()StringgetParserEncoding()booleanhasMatchingExtractors()Check whether the givenDocumentSourcecontent activates of not at least an extractor.SingleDocumentExtractionReportrun()Triggers the execution of all theExtractorregistered to this class using the default extraction parameters.SingleDocumentExtractionReportrun(ExtractionParameters extractionParameters)Triggers the execution of all theExtractorregistered to this class using the specified extraction parameters.voidsetLocalCopyFactory(LocalCopyFactory copyFactory)Sets the internal factory for generating the document local copy, ifnulltheMemCopyFactorywill be used.voidsetMIMETypeDetector(MIMETypeDetector detector)Sets the internal mime type detector, ifnullmimetype detection will be skipped and all extractors will be activated.voidsetParserEncoding(String encoding)Sets the document parser encoding.
-
-
-
Constructor Detail
-
SingleDocumentExtraction
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
Builds an extractor by the specification of document source, list of extractors and output triple handler.- Parameters:
configuration- configuration applied during extraction.in- input document source.extractors- list of extractors to be applied.output- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler.- Parameters:
configuration- configuration applied during extraction.in- input document source.factory- the extractors factory.output- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration.- Parameters:
in- input document source.factory- the extractors factory.output- output triple handler.
-
-
Method Detail
-
setLocalCopyFactory
public void setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy, ifnulltheMemCopyFactorywill be used.- Parameters:
copyFactory- local copy factory.- See Also:
DocumentSource
-
setMIMETypeDetector
public void setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector, ifnullmimetype detection will be skipped and all extractors will be activated.- Parameters:
detector- detector instance.
-
run
public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters) throws ExtractionException, IOException
Triggers the execution of all theExtractorregistered to this class using the specified extraction parameters.- Parameters:
extractionParameters- the parameters applied to the run execution.- Returns:
- the report generated by the extraction.
- Throws:
ExtractionException- if an error occurred during the data extraction.IOException- if an error occurred during the data access.
-
run
public SingleDocumentExtractionReport run() throws IOException, ExtractionException
Triggers the execution of all theExtractorregistered to this class using the default extraction parameters.- Returns:
- the extraction report.
- Throws:
IOException- if there is an error reading input from the document sourceExtractionException- if there is an error duing distraction
-
getDetectedMIMEType
public String getDetectedMIMEType() throws IOException
Returns the detected mimetype for the givenDocumentSource.- Returns:
- string containing the detected mimetype.
- Throws:
IOException- if an error occurred while accessing the data.
-
hasMatchingExtractors
public boolean hasMatchingExtractors() throws IOExceptionCheck whether the givenDocumentSourcecontent activates of not at least an extractor.- Returns:
trueif at least an extractor is activated,falseotherwise.- Throws:
IOException- if there is an error locating matching extractors
-
getMatchingExtractors
public List<Extractor> getMatchingExtractors()
- Returns:
- the list of all the activated extractors for the given
DocumentSource.
-
getParserEncoding
public String getParserEncoding()
- Returns:
- the configured parsing encoding.
-
setParserEncoding
public void setParserEncoding(String encoding)
Sets the document parser encoding.- Parameters:
encoding- parser encoding.
-
-