public class RDFaExtractor extends Object implements Extractor.TagSoupDOMExtractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor| Modifier and Type | Field and Description |
|---|---|
static ExtractorFactory<RDFaExtractor> |
factory |
static String |
NAME |
static String |
xsltFilename |
| Constructor and Description |
|---|
RDFaExtractor()
Default constructor, with no verification of data types and not stop at first error.
|
RDFaExtractor(boolean verifyDataType,
boolean stopAtFirstError)
Constructor, allows to specify the validation and error handling policies.
|
| Modifier and Type | Method and Description |
|---|---|
ExtractorDescription |
getDescription()
Returns a
ExtractorDescription of this extractor. |
static XSLTStylesheet |
getXSLT()
Returns a
XSLTStylesheet able to distill RDFa from
HTML pages. |
boolean |
isStopAtFirstError() |
boolean |
isVerifyDataType() |
void |
run(ExtractionParameters extractionParameters,
ExtractionContext extractionContext,
Document in,
ExtractionResult out)
Executes the extractor.
|
void |
setStopAtFirstError(boolean stopAtFirstError) |
void |
setVerifyDataType(boolean verifyDataType) |
public static final String NAME
public static final String xsltFilename
public static final ExtractorFactory<RDFaExtractor> factory
public RDFaExtractor(boolean verifyDataType,
boolean stopAtFirstError)
verifyDataType - if true the data types will be verified,
if false will be ignored.stopAtFirstError - if true the parser will stop at first parsing error,
if false will ignore non blocking errors.public RDFaExtractor()
public static XSLTStylesheet getXSLT()
XSLTStylesheet able to distill RDFa from
HTML pages.null XSLT instance.public boolean isVerifyDataType()
public void setVerifyDataType(boolean verifyDataType)
public boolean isStopAtFirstError()
public void setStopAtFirstError(boolean stopAtFirstError)
public void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out) throws IOException, ExtractionException
Extractorrun in interface Extractor<Document>extractionParameters - the parameters to be applied during the extraction.extractionContext - The document context.in - The extractor input data.out - the collector for the extracted data.IOException - On error while reading from the input stream.ExtractionException - On other error, such as parse errors.public ExtractorDescription getDescription()
ExtractorExtractorDescription of this extractor.getDescription in interface Extractor<Document>ExtractorDescription of this extractorCopyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.