Package org.apache.any23.extractor
Class SingleDocumentExtraction
- java.lang.Object
-
- org.apache.any23.extractor.SingleDocumentExtraction
-
public class SingleDocumentExtraction extends Object
This class acts as a facade where all extractors (for a given MIMEType) can be called on a single document. Extractors are automatically filtered by MIMEType.
-
-
Constructor Summary
Constructors Constructor Description SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler.SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
Builds an extractor by the specification of document source, list of extractors and output triple handler.SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
getDetectedMIMEType()
Returns the detected mimetype for the givenDocumentSource
.List<Extractor>
getMatchingExtractors()
String
getParserEncoding()
boolean
hasMatchingExtractors()
Check whether the givenDocumentSource
content activates of not at least an extractor.SingleDocumentExtractionReport
run()
Triggers the execution of all theExtractor
registered to this class using the default extraction parameters.SingleDocumentExtractionReport
run(ExtractionParameters extractionParameters)
Triggers the execution of all theExtractor
registered to this class using the specified extraction parameters.void
setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy, ifnull
theMemCopyFactory
will be used.void
setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector, ifnull
mimetype detection will be skipped and all extractors will be activated.void
setParserEncoding(String encoding)
Sets the document parser encoding.
-
-
-
Constructor Detail
-
SingleDocumentExtraction
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
Builds an extractor by the specification of document source, list of extractors and output triple handler.- Parameters:
configuration
- configuration applied during extraction.in
- input document source.extractors
- list of extractors to be applied.output
- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler.- Parameters:
configuration
- configuration applied during extraction.in
- input document source.factory
- the extractors factory.output
- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration
.- Parameters:
in
- input document source.factory
- the extractors factory.output
- output triple handler.
-
-
Method Detail
-
setLocalCopyFactory
public void setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy, ifnull
theMemCopyFactory
will be used.- Parameters:
copyFactory
- local copy factory.- See Also:
DocumentSource
-
setMIMETypeDetector
public void setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector, ifnull
mimetype detection will be skipped and all extractors will be activated.- Parameters:
detector
- detector instance.
-
run
public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters) throws ExtractionException, IOException
Triggers the execution of all theExtractor
registered to this class using the specified extraction parameters.- Parameters:
extractionParameters
- the parameters applied to the run execution.- Returns:
- the report generated by the extraction.
- Throws:
ExtractionException
- if an error occurred during the data extraction.IOException
- if an error occurred during the data access.
-
run
public SingleDocumentExtractionReport run() throws IOException, ExtractionException
Triggers the execution of all theExtractor
registered to this class using the default extraction parameters.- Returns:
- the extraction report.
- Throws:
IOException
- if there is an error reading input from the document sourceExtractionException
- if there is an error duing distraction
-
getDetectedMIMEType
public String getDetectedMIMEType() throws IOException
Returns the detected mimetype for the givenDocumentSource
.- Returns:
- string containing the detected mimetype.
- Throws:
IOException
- if an error occurred while accessing the data.
-
hasMatchingExtractors
public boolean hasMatchingExtractors() throws IOException
Check whether the givenDocumentSource
content activates of not at least an extractor.- Returns:
true
if at least an extractor is activated,false
otherwise.- Throws:
IOException
- if there is an error locating matching extractors
-
getMatchingExtractors
public List<Extractor> getMatchingExtractors()
- Returns:
- the list of all the activated extractors for the given
DocumentSource
.
-
getParserEncoding
public String getParserEncoding()
- Returns:
- the configured parsing encoding.
-
setParserEncoding
public void setParserEncoding(String encoding)
Sets the document parser encoding.- Parameters:
encoding
- parser encoding.
-
-