Data Conversion

/*1*/ Any23 runner = new Any23();
/*2*/ final String content = "@prefix foo: <> .   " +
                             "@prefix : <> ." +
                             "foo:bar foo: : .                          " +
                             ":bar : foo:bar .                           ";
//    The second argument of StringDocumentSource() must be a valid IRI.
/*3*/ DocumentSource source = new StringDocumentSource(content, "");
/*4*/ ByteArrayOutputStream out = new ByteArrayOutputStream();
/*5*/ TripleHandler handler = new NTriplesWriter(out);
      try {
/*6*/     runner.extract(source, handler);
      } finally {
/*7*/     handler.close();
/*8*/ String nt = out.toString("UTF-8");

This example aims to demonstrate how to use Apache Any23 to perform RDF data conversion. In this code we provide some input data expressed as Turtle and convert it in NTriples format.

At line 1 we define a new instance of the Apache Any23 facade, that provides all the methods useful for the transformation. The facade constructor accepts a list of extractor names, if specified the extraction will be done only over this list, otherwise the data MIME Type will detected and will be applied all the compatible extractors declared within the ExtractorRegistry.

The line 2 defines the input string containing some Turtle data.

At line 3 we instantiate a StringDocumentSource, specifying a content and a the source IRI. The IRI should be the source of the content data, and must be valid. Besides the StringDocumentSource, you can also provide input from other sources, such as HTTP requests and local files. See the classes in the sources package.

The line 4 defines a buffered output stream that will be used to store the data produced by the writer declared at line 5.

A writer stores the extracted triples in some destination. We use an NTriplesWriter here that writes into a ByteArrayOutputStream. The main RDF formats writers are available and it is possible also to store the triples directly into an RDF4J repository to query them via SPARQL. See RepositoryWriter and the writer package.

The extractor method invoked at line 6 performs the metadata extraction. This method accepts as first argument a DocumentSource and as second argument a TripleHandler, that will receive the sequence parsing events generated by the applied extractors. The extract method defines also another signature where it is possible to specify a charset encoding for the input data. If null, the charset will be auto detected.

The TripleHandler needs to be explicitly closed, this is done safely in a finally block at line 7.

The expected output is UTF-8 encoded at line 8:

<> <> <> .
<> <> <> .