用Java解析/提取HTML表，网站 [英] parsing/extracting a HTML Table, Website in Java

查看：112 发布时间：2020/4/24 9:55:14 html html-parsing jsoup html-table html-tableextract

本文介绍了用Java解析/提取HTML表，网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想解析此HTML表的内容:

I want to parse the contents of this HTML table :

这是完整的网站，带有源代码:

Here is the full website with source code:

http://www.kantschule-falkensee.de/uploads /dmiadgspahw/klassen/A_Klasse_11.htm

我想解析每个单元格的数据，以"Montag"(星期一)下的所有5个单元格为例. 我尝试了几种使用JSOUP解析此网站的方法，但是我没有获得任何成功.我的主要目标是在Android应用程序的列表视图中显示内容.现在，我尝试在Java控制台中打印内容.两种语言都可以接受:).感谢您的帮助.

I want to parse the data for each cell, all 5 cells under "Montag"(Monday) as an example. I tried several ways of parsing this Website using JSOUP but i havent got any succes with it. My main Goal is to show the contents in an listview in an Android app. For now i tried to print the contents in a java console. Both Languages are accepted :). Any Help is appreciated.

推荐答案

以下是您需要遵循的步骤:

Here are the steps you would need to follow:

1)您可以使用以下任何Java库进行HTML抓取:

标签汤
HtmlUnit
网络收获
jARVEST
jsoup
Jericho HTML解析器

1) You could use any of the below java libraries for HTML scraping:

Tag Soup
HtmlUnit
Web-Harvest
jARVEST
jsoup
Jericho HTML Parser

2)使用 Xpath助手

例如1:在查询中输入"//tr[1]//td[1]"，它将给出所有表元素在位置(1,1)

Eg 1: Enter "//tr[1]//td[1]" in the query and it will give all table elements at position (1,1)

例如2:"/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" 将为您提供Montag下的所有15个值.

Eg 2: "/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" Will give you all 15 values under Montag.

例如3:"/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" 将为您提供该表的所有380个条目

Eg 3: "/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" Will give you all 380 entries of the table

使用 Jsoup

import org.jsoup.Jsoup;
import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get();
        org.jsoup.select.Elements rows = doc.select("tr");
        for(org.jsoup.nodes.Element row :rows)
        {
            org.jsoup.select.Elements columns = row.select("td");
            for (org.jsoup.nodes.Element column:columns)
            {
                System.out.print(column.text());
            }
            System.out.println();
        }

    }
}

这篇关于用Java解析/提取HTML表，网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Java解析/提取HTML表，网站 [英] parsing/extracting a HTML Table, Website in Java

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

用Java解析/提取HTML表，网站 [英] parsing/extracting a HTML Table, Website in Java

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭