Package org.apache.any23
Class Any23
- java.lang.Object
-
- org.apache.any23.Any23
-
public class Any23 extends Object
A facade with convenience methods for typical Any23 extraction operations.- Author:
- Richard Cyganiak (richard@cyganiak.de), Michele Mostarda (michele.mostarda@gmail.com)
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_HTTP_CLIENT_USER_AGENT
Default HTTP User Agent defined in default configuration.protected static org.slf4j.Logger
logger
static String
VERSION
Any23 core library version.
-
Constructor Summary
Constructors Constructor Description Any23()
Constructor with default configuration.Any23(String... extractorNames)
Constructor that allows the specification of a list of extractor names.Any23(Configuration configuration)
Constructor acceptingConfiguration
.Any23(Configuration configuration, String... extractorNames)
Constructor that allows the specification of a custom configuration and of list of extractor names.Any23(Configuration configuration, ExtractorGroup extractorGroup)
Constructor that allows the specification of a custom configuration and of a list of extractors.Any23(ExtractorGroup extractorGroup)
Constructor that allows the specification of a list of extractors.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocumentSource
createDocumentSource(String documentIRI)
Returns the most appropriateDocumentSource
for the givendocumentIRI
.ExtractionReport
extract(File file, TripleHandler outputHandler)
Performs metadata extraction from the content of the givenfile
sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(String in, String documentIRI, String contentType, String encoding, TripleHandler outputHandler)
Performs metadata extraction on thein
string associated to thedocumentIRI
IRI, declaringcontentType
andencoding
.ExtractionReport
extract(String in, String documentIRI, TripleHandler outputHandler)
Performs metadata extraction on thein
string associated to thedocumentIRI
IRI, sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(String documentIRI, TripleHandler outputHandler)
Performs metadata extraction from the content of the givendocumentIRI
sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(ExtractionParameters eps, String documentIRI, TripleHandler outputHandler)
Performs metadata extraction from the content of the givendocumentIRI
sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler)
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler, String encoding)
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(DocumentSource in, TripleHandler outputHandler)
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.ExtractionReport
extract(DocumentSource in, TripleHandler outputHandler, String encoding)
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.HTTPClient
getHTTPClient()
Returns the currentHTTPClient
implementation.String
getHTTPUserAgent()
Returns the HTTP Header User Agent, see RFC 2616-14.43.void
setCacheFactory(LocalCopyFactory cache)
Allows to set aLocalCopyFactory
instance.void
setHTTPClient(HTTPClient httpClient)
Allows to set theHTTPClient
implementation used to retrieve contents.void
setHTTPUserAgent(String userAgent)
Sets the HTTP Header User Agent, see RFC 2616-14.43.void
setMIMETypeDetector(MIMETypeDetector detector)
Allows to set an instance ofMIMETypeDetector
.
-
-
-
Field Detail
-
VERSION
public static final String VERSION
Any23 core library version. NOTE: there's also a version string in pom.xml, they should match.
-
DEFAULT_HTTP_CLIENT_USER_AGENT
public static final String DEFAULT_HTTP_CLIENT_USER_AGENT
Default HTTP User Agent defined in default configuration.
-
logger
protected static final org.slf4j.Logger logger
-
-
Constructor Detail
-
Any23
public Any23(Configuration configuration, ExtractorGroup extractorGroup)
Constructor that allows the specification of a custom configuration and of a list of extractors.- Parameters:
configuration
- configuration used to build the Any23 instance.extractorGroup
- the group of extractors to be applied.
-
Any23
public Any23(ExtractorGroup extractorGroup)
Constructor that allows the specification of a list of extractors.- Parameters:
extractorGroup
- the group of extractors to be applied.
-
Any23
public Any23(Configuration configuration, String... extractorNames)
Constructor that allows the specification of a custom configuration and of list of extractor names.- Parameters:
configuration
- aConfiguration
objectextractorNames
- list of extractor's names.
-
Any23
public Any23(String... extractorNames)
Constructor that allows the specification of a list of extractor names.- Parameters:
extractorNames
- list of extractor's names.
-
Any23
public Any23(Configuration configuration)
Constructor acceptingConfiguration
.- Parameters:
configuration
- aConfiguration
object
-
Any23
public Any23()
Constructor with default configuration.
-
-
Method Detail
-
setHTTPUserAgent
public void setHTTPUserAgent(String userAgent)
Sets the HTTP Header User Agent, see RFC 2616-14.43.- Parameters:
userAgent
- text describing the user agent.
-
getHTTPUserAgent
public String getHTTPUserAgent()
Returns the HTTP Header User Agent, see RFC 2616-14.43.- Returns:
- text describing the user agent.
-
setHTTPClient
public void setHTTPClient(HTTPClient httpClient)
Allows to set theHTTPClient
implementation used to retrieve contents. The default instance isDefaultHTTPClient
.- Parameters:
httpClient
- a valid client instance.- Throws:
IllegalStateException
- if invoked after client has been initialized.
-
getHTTPClient
public HTTPClient getHTTPClient() throws IOException
Returns the currentHTTPClient
implementation.- Returns:
- instance of HTTPClient.
- Throws:
IOException
- if the HTTP client has not initialized.
-
setCacheFactory
public void setCacheFactory(LocalCopyFactory cache)
Allows to set aLocalCopyFactory
instance.- Parameters:
cache
- valid cache instance.
-
setMIMETypeDetector
public void setMIMETypeDetector(MIMETypeDetector detector)
Allows to set an instance ofMIMETypeDetector
.- Parameters:
detector
- a valid detector instance, ifnull
all the detectors will be used.
-
createDocumentSource
public DocumentSource createDocumentSource(String documentIRI) throws URISyntaxException, IOException
Returns the most appropriate
DocumentSource
for the givendocumentIRI
.N.B.
documentIRI's
should contain a protocol. E.g. http:, https:, file:- Parameters:
documentIRI
- the document IRI.- Returns:
- a new instance of DocumentSource.
- Throws:
URISyntaxException
- if an error occurs while parsing thedocumentIRI
as a IRI.IOException
- if an error occurs while initializing the internalHTTPClient
.
-
extract
public ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler, String encoding) throws IOException, ExtractionException
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.- Parameters:
eps
- the extraction parameters to be applied.in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.encoding
- explicit encoding see available encodings.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(String in, String documentIRI, String contentType, String encoding, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction on thein
string associated to thedocumentIRI
IRI, declaringcontentType
andencoding
. The generated events are sent to the specifiedoutputHandler
.- Parameters:
in
- raw data to be analyzed.documentIRI
- IRI from which the raw data has been extracted.contentType
- declared data content type.encoding
- declared data encoding.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(String in, String documentIRI, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction on thein
string associated to thedocumentIRI
IRI, sending the generated events to the specifiedoutputHandler
.- Parameters:
in
- raw data to be analyzed.documentIRI
- IRI from which the raw data has been extracted.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(File file, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction from the content of the givenfile
sending the generated events to the specifiedoutputHandler
.- Parameters:
file
- file containing raw data.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(ExtractionParameters eps, String documentIRI, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction from the content of the givendocumentIRI
sending the generated events to the specifiedoutputHandler
. If the IRI is replied with a redirect, the last will be followed.- Parameters:
eps
- the parameters to be applied to the extraction.documentIRI
- the IRI from which retrieve document.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(String documentIRI, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction from the content of the givendocumentIRI
sending the generated events to the specifiedoutputHandler
. If the IRI is replied with a redirect, the last will be followed.- Parameters:
documentIRI
- the IRI from which retrieve document.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(DocumentSource in, TripleHandler outputHandler, String encoding) throws IOException, ExtractionException
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.- Parameters:
in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.encoding
- explicit encoding see available encodings.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(DocumentSource in, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.- Parameters:
in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
extract
public ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler) throws IOException, ExtractionException
Performs metadata extraction from the content of the givenin
document source, sending the generated events to the specifiedoutputHandler
.- Parameters:
eps
- the parameters to be applied for the extraction phase.in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.- Returns:
true
if some extraction occurred,false
otherwise.- Throws:
IOException
- if there is an error reading theDocumentSource
ExtractionException
- if there is an error during extraction
-
-