|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.any23.extractor.rdfa.RDFaExtractor
public class RDFaExtractor
Extractor for RDFa in HTML, based on Fabien Gadon's XSLT transform, found here. It works by first parsing the HTML using a tagsoup parser, then applies the XSLT to the DOM tree, then parses the resulting RDF/XML.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor |
|---|
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor |
| Field Summary | |
|---|---|
static ExtractorFactory<RDFaExtractor> |
factory
|
static String |
NAME
|
static String |
xsltFilename
|
| Constructor Summary | |
|---|---|
RDFaExtractor()
Default constructor, with no verification of data types and not stop at first error. |
|
RDFaExtractor(boolean verifyDataType,
boolean stopAtFirstError)
Constructor, allows to specify the validation and error handling policies. |
|
| Method Summary | |
|---|---|
ExtractorDescription |
getDescription()
Returns a ExtractorDescription of this extractor. |
static XSLTStylesheet |
getXSLT()
Returns a XSLTStylesheet able to distill RDFa from
HTML pages. |
boolean |
isStopAtFirstError()
|
boolean |
isVerifyDataType()
|
void |
run(ExtractionParameters extractionParameters,
ExtractionContext extractionContext,
Document in,
ExtractionResult out)
Executes the extractor. |
void |
setStopAtFirstError(boolean stopAtFirstError)
|
void |
setVerifyDataType(boolean verifyDataType)
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String NAME
public static final String xsltFilename
public static final ExtractorFactory<RDFaExtractor> factory
| Constructor Detail |
|---|
public RDFaExtractor(boolean verifyDataType,
boolean stopAtFirstError)
verifyDataType - if true the data types will be verified,
if false will be ignored.stopAtFirstError - if true the parser will stop at first parsing error,
if false will ignore non blocking errors.public RDFaExtractor()
| Method Detail |
|---|
public static XSLTStylesheet getXSLT()
XSLTStylesheet able to distill RDFa from
HTML pages.
null XSLT instance.public boolean isVerifyDataType()
public void setVerifyDataType(boolean verifyDataType)
public boolean isStopAtFirstError()
public void setStopAtFirstError(boolean stopAtFirstError)
public void run(ExtractionParameters extractionParameters,
ExtractionContext extractionContext,
Document in,
ExtractionResult out)
throws IOException,
ExtractionException
Extractor
run in interface Extractor<Document>extractionParameters - the parameters to be applied during the extraction.extractionContext - The document context.in - The extractor input data.out - the collector for the extracted data.
IOException - On error while reading from the input stream.
ExtractionException - On other error, such as parse errors.public ExtractorDescription getDescription()
ExtractorExtractorDescription of this extractor.
getDescription in interface Extractor<Document>ExtractorDescription of this extractor
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||