Class EntityBasedMicroformatExtractor

    • Constructor Detail

      • EntityBasedMicroformatExtractor

        public EntityBasedMicroformatExtractor()
    • Method Detail

      • getBaseClassName

        protected abstract String getBaseClassName()
        Returns the base class name for the extractor.
        Returns:
        a string containing the base of the extractor.
      • resetExtractor

        protected abstract void resetExtractor()
        Resets the internal status of the extractor to prepare it to a new extraction section.
      • extractEntity

        protected abstract boolean extractEntity​(Node node,
                                                 ExtractionResult out)
                                          throws ExtractionException
        Extracts an entity from a DOM node.
        Parameters:
        node - the DOM node.
        out - the extraction result collector.
        Returns:
        true if the extraction has produces something, false otherwise.
        Throws:
        ExtractionException - if there is an error during extraction
      • extract

        public boolean extract()
                        throws ExtractionException
        Description copied from class: MicroformatExtractor
        Performs the extraction of the data and writes them to the model. The nodes generated in the model can have any name or implicit label but if possible they SHOULD have names (either URIs or AnonId) that are uniquely derivable from their position in the DOM tree, so that multiple extractors can merge information.
        Specified by:
        extract in class MicroformatExtractor
        Returns:
        true if extraction is successful
        Throws:
        ExtractionException - if there is an error during extraction
      • getBlankNodeFor

        protected org.eclipse.rdf4j.model.BNode getBlankNodeFor​(Node node)
        Parameters:
        node - a DOM node representing a blank node
        Returns:
        an RDF blank node corresponding to that DOM node, by using a blank node ID like "MD5 of http://doc-uri/#xpath/to/node"