Package org.apache.any23.extractor.html
Class EncodingTest
- java.lang.Object
-
- org.apache.any23.AbstractAny23TestBase
-
- org.apache.any23.extractor.html.EncodingTest
-
public class EncodingTest extends AbstractAny23TestBase
Test class to ensure behaviors ofHTMLDocumentparser with encoding corner cases.
-
-
Field Summary
-
Fields inherited from class org.apache.any23.AbstractAny23TestBase
tempDirectory, testFolder
-
-
Constructor Summary
Constructors Constructor Description EncodingTest()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidtestEncodingHTML_ISO_8859_1()voidtestEncodingHTML_UTF_8()voidtestEncodingHTML_UTF_8_DeclarationAfterTitle()Known issue: NekoHTML does not auto-detect the encoding, but relies on the explicitly specified encoding (via XML declaration or HTTP-Equiv meta header).voidtestEncodingXHTML_ISO_8859_1()voidtestEncodingXHTML_UTF_8()-
Methods inherited from class org.apache.any23.AbstractAny23TestBase
copyResourceToTempFile, getDocumentSourceFromResource, getDocumentSourceFromResource, setUp
-
-
-
-
Method Detail
-
testEncodingHTML_ISO_8859_1
public void testEncodingHTML_ISO_8859_1() throws Exception- Throws:
Exception
-
testEncodingHTML_UTF_8_DeclarationAfterTitle
public void testEncodingHTML_UTF_8_DeclarationAfterTitle() throws ExceptionKnown issue: NekoHTML does not auto-detect the encoding, but relies on the explicitly specified encoding (via XML declaration or HTTP-Equiv meta header). If the meta header comes *after* the title element, then NekoHTML will not use the declared encoding for the title. For this test we expect to not recognize the title.- Throws:
Exception- if there is an error asserting the test data.
-
testEncodingXHTML_ISO_8859_1
public void testEncodingXHTML_ISO_8859_1() throws Exception- Throws:
Exception
-
-