Configuration
Configure the Core Module
The core module contains the main library code and the command-line implementation.
The main library configuration parameters are managed by the Configuration class. The default values are declared within the default-configuration.properties file. The following sections explain how to override the default configuration.
Override Default Configuration from Command-line
The default configuration can be overriden via command-line by passing to the java command system properties with the same name of the ones declared in configuration.
For example to override the HTTP Max Client Connections parameter it is sufficient to add the following option to the java command-line invocation:
-Dany23.http.client.max.connections=10
any23 and any23server scripts accept the variable ANY23_OPTS to specify custom options. It is possible to customize the HTTP Max Client Connections for the any23 script simply using:
cli/target/appassembler/bin/$ ANY23_OPTS="-Dany23.http.client.max.connections=10" any23 http://path/to/resource
Override Default Configuration Programmatically
The Configuration properties can be accessed in read-only mode just retrieving the configuration singleton instance.
Such instance is immutable:
final Configuration immutableConf = DefaultConfiguration.singleton(); final String propertyValue = immutableConf.getProperty("propertyName", "default value"); ...
To obtain a modifiable Configuration instead it is possible to use the copy() method.
One of the Apache Any23 constructors accepts a Configuration object that allows to customize the behavior of the Apache Any23 instance for its entire life-cycle.
final ModifiableConfiguration modifiableConf = DefaultConfiguration.copy(); final String oldPropertyValue = modifiableConf.setProperty("propertyName", "new property value"); final Apache Any23 any23 = new Apache Any23(modifiableConf, "extractor1", ...); ...
Use of ExtractionParameters
It is possible to customize the behavior of a single data extraction by providing an ExtractionParameters instance to one the Apache Any23#extract() methods accepting it. ExtractionParameters allows to customize any property and flag other then the specific extraction options.
If no custom parameters are specified the default configuration values are used.
final Any23 any23 = ... final TripleHandler tripleHandler = ... final ExtractionParameters extractionParameters = ExtractionParameters.getDefault(); extractionParameters.setFlag("any23.microdata.strict", true); any23.extract(extractionParameters, "http://path/to/doc", tripleHandler);
Apache Any23 Core Module Default Configuration
Property Name | Default Property Value | Description |
any23.core.version | current any23 core version | String declaring the Apache Any23 Core module version. |
any23.http.user.agent.default | Apache Any23-CLI | User Agent Name used for HTTP requests. |
any23.http.client.timeout | 10000 (10 secs) | Timeout in milliseconds for a HTTP request. |
any23.http.client.max.connections | 5 | Max number of concurrent HTTP connections allowed by the internal Apache Any23 HTTP client. |
any23.rdfa.extractor.xslt | rdfa.xslt | XSLT Stylesheet to be used to perform HTML to RDF extraction of RDFa. |
any23.extraction.metadata.timesize | off (possible values: on/off) | Activates/deactivates the generation of time and size metadata triples. |
any23.extraction.metadata.nesting | on (possible values: on/off) | Activates/deactivates the generation of nesting triples for Microformat entities. |
any23.extraction.metadata.domain.per.entity | on (possible values: on/off) | Activates/deactivates the generation of domain triple per entity. |
any23.extraction.rdfa.programmatic | on (possible values: on/off) | Switches between the programmatic RDFa 1.1 Extractor and the RDFa 1.0 XSLT base one. |
any23.extraction.context.iri | ?(means current document IRI) | Default value for extraction content IRI. |
any23.plugin.dirs | ./plugins | Directory containing Apache Any23 plugins. |
any23.microdata.strict | on (possible values: on/off) | Activates/deactivates the microdata strict validation. |
any23.microdata.ns.default | http://schema.org/ | Microdata default namespace. |
any23.extraction.head.meta | on (possible values: on/off) | Activates/deactivates the HTMLMetaExtractor. |
any23.extraction.csv.field | , | CSVExtractor field separator. |
any23.extraction.csv.comment | # | CSVExtractor line comment marker. |