Apache Any23 Plugins
Introduction
This section describes the Apache Any23 plugins support.
Apache Any23 comes with a set of predefined plugins. Such plugins are located under the $ANY23_HOME/plugins dir.
A plugin is a standard Maven3 module containing any implementation of
How to Register a Plugin
A plugin can be added to the Apache Any23 CLI interface by:
- adding its JAR to the Apache Any23 JVM classpath;
- adding its JAR to the CLASSPATH_PREFIX environment variable as:
export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar
- adding its JAR to the $HOME/.any23/plugins directory.
A plugin can be added to the Apache Any23 library API by first creating a static instance of Any23PluginManager#getInstance(). Once this is done there is a variety of options to configure and register a plugins, etc. An example of dynamic plugin loading can be seen via the way that the OpenIE toggling is implemented within the Any23 Webservice e.g.
if (openie) { Any23PluginManager pManager = Any23PluginManager.getInstance(); //Dynamically adding Jar's to the Classpath via the following logic //is absolutely dependant on the 'apache-any23-openie' directory being //present within the webapp /lib directory. This is specified within //the maven-dependency-plugin. File webappClasspath = new File(getClass().getClassLoader().getResource("").getPath()); File openIEJarPath = new File(webappClasspath.getParentFile().getPath() + "/lib/apache-any23-openie"); boolean loadedJars = pManager.loadJARDir(openIEJarPath); if (loadedJars) { ExtractorRegistry r = ExtractorRegistryImpl.getInstance(); try { pManager.getExtractors().forEachRemaining(r::register); } catch (IOException e) { LOG.error("Error during dynamic classloading of JARs from OpenIE runtime directory {}", openIEJarPath.toString(), e); } LOG.info("Successful dynamic classloading of JARs from OpenIE runtime directory {}", openIEJarPath.toString()); } }
Any implementation of ExtractorPlugin will automatically registered to the ExtractorRegistry.
Any detected implementation of Tool will be listed by the ToolRunner command-line tool in any23-root/cli/bin/any23 .
How to Build a Plugin
Apache Any23 takes care to test and package plugins when distributed from its reactor POM. It is aways possible to rebuild a plugin using the command:
<plugin-dir>$ mvn clean assembly:assembly
How to Write an Extractor Plugin
An Extractor Plugin is a class:
- implementing one of the Extractor subinterfaces;
- packaged under org.apache.any23.plugin .
An example of plugin is defined below.
@Author(name="Michele Mostarda (mostarda@fbk.eu)") public class HTMLScraperExtractor implements Extractor.ContentExtractor { private static final Logger logger = LoggerFactory.getLogger(HTMLScraperPlugin.class); @Override public void run( ExtractionParameters extractionParameters, ExtractionContext extractionContext, InputStream inputStream, ExtractionResult extractionResult ) throws IOException, ExtractionException { ... } @Override public ExtractorDescription getDescription() { return HTMLScraperExtractorFactory.getDescriptionInstance(); } @Override public void setStopAtFirstError(boolean b) { // Ignored. } }
How to Write a Tool Plugin
A Tool Plugin is a Java class that:
- implementing the Tool interface;
- CLI parameters are extracted by annotating the class members with JCommander annotations.
- have to be found using the ServiceLoader (we usually plug the Kohsuke's generator)
An example of plugin is defined below.
@Parameters(commandNames = { "myexec" }, commandDescription = "Prints out XXX used by Any23.") public class MyExecutableTool implements Tool { @Parameter(names = { "-u", "--urls" }, description = "URLs to process") private List<URL> pairs; public void run() throws Exception; } }
So when executing any23>>, the <<<myexec
will be available in the commands list.
Available Extractor Plugins
- HTML Scraper Plugin
The HTMLScraperPlugin is able to scrape plain text content from any HTML page and transform it into statement literals.
This plugin is documented here.
- Office Scraper Plugins
The Office Scraper Plugins allow to extract semantic content from several Microsoft Office document formats.
These plugins are documented here.
- OpenIE Extractor Plugin
As of 2.1 Any23 provides functionality to extract triples using the Open Information Extraction (Open IE) system. The Open IE system runs over input sentences and creates extractions that represent relations in text, in the case of Any23, this results in triples. Se the above example on how to register a plugin to see how the OpenIE Extractor plugin is currently used within the Any23 Service.
Available CLI Tool Plugins
- Crawler CLI Tool
The Crawler CLI Tool is an extension of the Rover CLI Tool to add site crawling basic capabilities. More information about the CLI can be found at Getting Started - Crawler Tool section.