javax.swing.text.ElementIterator奇怪的行为 [英] javax.swing.text.ElementIterator weird behavior

查看：54 发布时间：2021/5/15 18:39:43 java parsing swing html-parsing

本文介绍了javax.swing.text.ElementIterator奇怪的行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用javax.swing.text.ElementIterator()出现奇怪的行为.它永远不会显示所有元素，并且会显示不同数量的元素，具体取决于我使用哪种类型的ParserCallback.以下测试是使用我个人资料中的网站完成的，但可以使用任何其他较大的html文件完成.

I'm getting a weird behavior with javax.swing.text.ElementIterator(). It never shows all elements, and it shows a different amount of elements depending on what type of ParserCallback I use. The test below is done with the website that is in my profile, but can be done with any other big html file.

// some imports shown in case its an import mixup
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.ChangedCharSetException;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.StyleConstants;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;

// Shows whats in an element, recursively
public void printElement(HTMLDocument htmlDoc, Element element)
        throws BadLocationException
{
    AttributeSet attributes = element.getAttributes();
    System.out.println("element: '" + element.toString().trim() + "', name: '" + element.getName() + "', children: " + element.getElementCount() + ", attributes: " + attributes.getAttributeCount() + ", leaf: " + element.isLeaf());
    Enumeration attrEnum = attributes.getAttributeNames();
    while (attrEnum.hasMoreElements())
    {
        Object attr = attrEnum.nextElement();
        System.out.println("\tAttribute: '" + attr + "', Val: '" + attributes.getAttribute(attr) + "'");
        if (attr == StyleConstants.NameAttribute
                && attributes.getAttribute(StyleConstants.NameAttribute) == HTML.Tag.CONTENT)
        {
            int startOffset = element.getStartOffset();
            int endOffset = element.getEndOffset();
            int length = endOffset - startOffset;
            System.out.printf("\t\tContent (%d-%d): '%s'\n", startOffset, endOffset, htmlDoc.getText(startOffset, length).trim());
        }
    }
    for (int i = 0; i < element.getElementCount(); i++)
    {
        Element child = element.getElement(i);
        printElement(htmlDoc, child);
    }
}

public void tryParse(String filename) 
        throws FileNotFoundException, IOException, BadLocationException
{
    BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(filename)));

    Parser parser = new ParserDelegator();
    HTMLEditorKit htmlKit = new HTMLEditorKit();
    HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
    ParserCallback callback2 = htmlDoc.getReader(0);
    ParserCallback callback1 =
            new HTMLEditorKit.ParserCallback()
            {
            };

    parser.parse(in, callback2, true);
    ElementIterator iterator = new ElementIterator(htmlDoc);
    Element element;
    while ((element = iterator.next()) != null)
        printElement(htmlDoc, element);
    in.close();
}

在上面的测试中，如果我使用callback1或callback2，结果会有所不同.甚至很奇怪，如果我确实用适当的函数填充回调并让它们输出某些内容，它们表明解析器确实可以处理整个网站，但是ElementIterator仍然不能完全解决问题.

In the test above, the results vary if I use callback1 or callback2. Even weirder, if I do fill the callbacks with the appropriate functions and have them output something, they show that the parser does handle the whole website, but the ElementIterator still doesn't have it all.

我还尝试使用htmlKit.read()代替parser.parse()，但仍然无法正常工作.

I've also tried to use htmlKit.read() instead of parser.parse(), but it still doesn't work.

尽管我现在通过使用解析器回调函数(此处未显示)获得期望的结果，但我仍然想知道为什么ElementIterator不能按预期工作，以防以后需要，所以我想知道这里是否有人有经验使用该ElementIterator并可以回答.

Although I'm now getting my desired results by using the parser callback functions (not shown here), I still wonder why ElementIterator doesn't work as expected in case I need it later, so I wonder if anyone here has experience with that ElementIterator and can answer.

更新:完整的Java源代码上传到这里: http://home.snafu.de/tilman/tmp/Main.java

Update: Complete Java Source uploaded here: http://home.snafu.de/tilman/tmp/Main.java

推荐答案

使用在此处看到的方法，我没有注意到您描述的问题.我添加了 println()，所有元素似乎都在那里.

Using the approach seen here, I haven't noticed the problem you describe. I added a println(), and all the elements seem to be there.

附录:我不确定您的 tryParse()如何失败，但是您的 printElement()似乎可以在我的 main()中工作>:

Addendum: I'm not sure how your tryParse() fails, but your printElement() seems to work from my main():

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Enumeration;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.StyleConstants;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

/** @see https://stackoverflow.com/questions/2882782 */
public class NewMain {

    public static void main(String args[]) throws Exception {
        HTMLEditorKit htmlKit = new HTMLEditorKit();
        HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
        htmlKit.read(new BufferedReader(new FileReader("test.html")), htmlDoc, 0);
        ElementIterator iterator = new ElementIterator(htmlDoc);
        Element element;
        while ((element = iterator.next()) != null) {
            printElement(htmlDoc, element);
        }
    }
    private static void printElement(HTMLDocument htmlDoc, Element element)
        throws BadLocationException {
        AttributeSet attrSet = element.getAttributes();
        System.out.println(""
            + "Element: '" + element.toString().trim()
            + "', name: '" + element.getName()
            + "', children: " + element.getElementCount()
            + ", attributes: " + attrSet.getAttributeCount()
            + ", leaf: " + element.isLeaf());
        Enumeration attrNames = attrSet.getAttributeNames();
        while (attrNames.hasMoreElements()) {
            Object attr = attrNames.nextElement();
            System.out.println("  Attribute: '" + attr + "', Value: '"
                + attrSet.getAttribute(attr) + "'");
            Object tag = attrSet.getAttribute(StyleConstants.NameAttribute);
            if (attr == StyleConstants.NameAttribute
                && tag == HTML.Tag.CONTENT) {
                int startOffset = element.getStartOffset();
                int endOffset = element.getEndOffset();
                int length = endOffset - startOffset;
                System.out.printf("    Content (%d-%d): '%s'\n", startOffset,
                    endOffset, htmlDoc.getText(startOffset, length).trim());
            }
        }
    }
}

这篇关于javax.swing.text.ElementIterator奇怪的行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

javax.swing.text.ElementIterator奇怪的行为 [英] javax.swing.text.ElementIterator weird behavior

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

javax.swing.text.ElementIterator奇怪的行为 [英] javax.swing.text.ElementIterator weird behavior

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭