Package org.apache.any23.extractor.rdf
Class BaseRDFExtractor
- java.lang.Object
-
- org.apache.any23.extractor.rdf.BaseRDFExtractor
-
- All Implemented Interfaces:
Extractor<InputStream>,Extractor.ContentExtractor
- Direct Known Subclasses:
FunctionalSyntaxExtractor,JSONLDExtractor,ManchesterSyntaxExtractor,NQuadsExtractor,NTriplesExtractor,RDFa11Extractor,RDFaExtractor,RDFXMLExtractor,TriXExtractor,TurtleExtractor
public abstract class BaseRDFExtractor extends Object implements Extractor.ContentExtractor
Base class for a generic RDFExtractor.ContentExtractor.- Author:
- Michele Mostarda (mostarda@fbk.eu), Hans Brende (hansbrende@apache.org)
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
-
-
Constructor Summary
Constructors Constructor Description BaseRDFExtractor()BaseRDFExtractor(boolean verifyDataType, boolean stopAtFirstError)Constructor, allows to specify the validation and error handling policies.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract org.eclipse.rdf4j.rio.RDFParsergetParser(ExtractionContext extractionContext, ExtractionResult extractionResult)booleanisStopAtFirstError()booleanisVerifyDataType()voidrun(ExtractionParameters extractionParameters, ExtractionContext extractionContext, InputStream in, ExtractionResult extractionResult)Executes the extractor.voidsetStopAtFirstError(boolean b)Iftrue, the extractor will stop at first parsing error, iffalsethe extractor will attempt to ignore all parsing errors.voidsetVerifyDataType(boolean verifyDataType)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.any23.extractor.Extractor
getDescription
-
-
-
-
Constructor Detail
-
BaseRDFExtractor
public BaseRDFExtractor()
-
BaseRDFExtractor
public BaseRDFExtractor(boolean verifyDataType, boolean stopAtFirstError)Constructor, allows to specify the validation and error handling policies.- Parameters:
verifyDataType- iftruethe data types will be verified, iffalsewill be ignored.stopAtFirstError- iftruethe parser will stop at first parsing error, iffalsewill ignore non blocking errors.
-
-
Method Detail
-
getParser
protected abstract org.eclipse.rdf4j.rio.RDFParser getParser(ExtractionContext extractionContext, ExtractionResult extractionResult)
-
isVerifyDataType
public boolean isVerifyDataType()
-
setVerifyDataType
public void setVerifyDataType(boolean verifyDataType)
-
isStopAtFirstError
public boolean isStopAtFirstError()
-
setStopAtFirstError
public void setStopAtFirstError(boolean b)
Description copied from interface:Extractor.ContentExtractorIftrue, the extractor will stop at first parsing error, iffalsethe extractor will attempt to ignore all parsing errors.- Specified by:
setStopAtFirstErrorin interfaceExtractor.ContentExtractor- Parameters:
b- tolerance flag.
-
run
public void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, InputStream in, ExtractionResult extractionResult) throws IOException, ExtractionException
Description copied from interface:ExtractorExecutes the extractor. Will be invoked only once, extractors are not reusable.- Specified by:
runin interfaceExtractor<InputStream>- Parameters:
extractionParameters- the parameters to be applied during the extraction.extractionContext- The document context.in- The extractor input data.extractionResult- the collector for the extracted data.- Throws:
IOException- On error while reading from the input stream.ExtractionException- On other error, such as parse errors.
-
-