如何使用Java自己的解析器从HTML中提取信息? [英] How to extract info from HTML with Java's own Parser?

查看:54
本文介绍了如何使用Java自己的解析器从HTML中提取信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不想下载任何其他库,我是在谈论这个库: javax.swing.text.html.HTMLEditorKit.Parser

I don't want to download any other libraries, i'm talking about this one: javax.swing.text.html.HTMLEditorKit.Parser

如何使用此解析器提取页面中的重复信息?

How can I extract repeated information within a page using this parser?

例如,我在页面中重复了以下代码:

Say for example I have this code repeated in a page:

    <tr>
      <td class="info1">get this info</td>
      <td class="info2">get this info</td>
      <td class="info3">get this info</td>
    </tr>

请问有什么示例代码吗?

Can I have any example code please?

谢谢.

推荐答案

这是一个流解析器,因此解析时可以告诉您所遇到的问题.您应该扩展 具有某些类的HTMLEditorKit.ParserCallback (我将其称为 Parser ),然后覆盖您关心的方法.

It's a stream parser, so as it parses it tells you what it hits. You should extend HTMLEditorKit.ParserCallback with some class (I'll call it Parser), then override the methods you care about.

我认为它仅适用于"swing中的html dtd"(请参见我以前链接的.

I believe it only works for "the html dtd in swing" (see here). If you're doing anything more complicated recommend you instead use an external Java HTML parsing library, such as one of the ones I linked to before.

这是基本代码(演示):

import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
import java.io.*;

class Parser extends HTMLEditorKit.ParserCallback
{
        private boolean inTD = false;

        public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
        {
                if(t.equals(HTML.Tag.TD))
                {
                        inTD = true;
                }
        }

        public void handleEndTag(HTML.Tag t, int pos)
        {
                if(t.equals(HTML.Tag.TD))
                {
                        inTD = false;
                }
        }

        public void handleText(char[] data, int pos)
        {
                if(inTD)
                {
                        doSomethingWith(data);
                }
        }

        public void doSomethingWith(char[] data)
        {
                System.out.println(data);
        }

}

class HtmlTester
{
        public static void main (String[] args) throws java.lang.Exception
        {               
            ParserDelegator pd = new ParserDelegator();
            pd.parse(new BufferedReader(new InputStreamReader(System.in)), new Parser(), false);
        }
}

这篇关于如何使用Java自己的解析器从HTML中提取信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆