Package org.apache.any23.extractor.html
All the various
Extractor
needed to distill RDF from
Microformats in HTML pages are contained in this package.-
Class Summary Class Description AdrExtractor Extractor for the adr microformat.AdrExtractorFactory DocumentReport Represents the validationReportBuilder generated by a theTagSoupParser
when a document is retrieved and validated.DomUtils This class provides utility methods for DOM manipulation.EmbeddedJSONLDExtractor This extractor represents the HTML script tags used to embed blocks of data in documents.EmbeddedJSONLDExtractorFactory EntityBasedMicroformatExtractor Base class for microformat extractors based on entities.GeoExtractor Extractor for the Geo microformat.GeoExtractorFactory HCalendarExtractor Extractor for the hCalendar microformat.HCalendarExtractorFactory HCardExtractor Extractor for the hCard microformat.HCardExtractorFactory HCardName An HCard name, consisting of various parts.HeadLinkExtractor ThisExtractor.TagSoupDOMExtractor
implementation retrieves theLINK
s declared within theHTML/HEAD
page header.HeadLinkExtractorFactory HListingExtractor Extractor for the hListing microformat.HListingExtractorFactory HRecipeExtractor Extractor for the hRecipe microformat.HRecipeExtractorFactory HResumeExtractor Extractor for the hResume microformat.HResumeExtractorFactory HReviewAggregateExtractor Extractor for the hReview-aggregate microformat.HReviewAggregateExtractorFactory HReviewExtractor Extractor for the hReview microformat.HReviewExtractorFactory HTMLDocument A wrapper around the DOM representation of an HTML document.HTMLDocument.TextField This class represents a text extracted from the HTML DOM related to the node from which such test has been retrieved.HTMLMetaExtractor This extractor represents the HTML META tag values according the HTML4 specification.HTMLMetaExtractorFactory ICBMExtractor Extractor for "ICBM coordinates" provided as META headers in the head of an HTML page.ICBMExtractorFactory JsoupUtils LicenseExtractor Extractor for the rel-license microformat.LicenseExtractorFactory MicroformatExtractor The abstract base class for any Microformat specification extractor.SpanCloserInputStream Extension ofInputStream
meant to detect and replace any occurrence of inline span:SpeciesExtractor Extractor able to extract the Species Microformat.SpeciesExtractorFactory TagSoupParser Parses anInputStream
into an HTML DOM tree.TagSoupParser.ElementLocation Describes a DOM Element location.TitleExtractor Extracts the value of the <title> element of an HTML or XHTML page.TitleExtractorFactory TurtleHTMLExtractor Extractor for Turtle/N3 format embedded within HTML script tags.TurtleHTMLExtractorFactory XFNExtractor Extractor for the XFN microformat.XFNExtractorFactory