如何转换 HTML <table>到二维数组 [英] How to convert HTML <table> to a 2D array
问题描述
假设我复制了一个完整的 HTML 表(当每个 tr
和 td
都有额外的属性时)成一个字符串.如何获取所有内容(标签之间的内容)并创建一个像原始表格一样组织的二维数组?
Lets say I copy a complete HTML table (when each and every tr
and td
has extra attributes)
into a String. How can I take all the contents (what is between the tags) and create an 2D array that is organized like the original table?
例如对于这个表:
<table border="1">
<tr align= "center">
<td align="char">TD1</td>
<td>td1</td>
<td align="char">TD1</td>
<td>td1</td>
</tr>
<tr>
<td>TD2</td>
<td>tD2</td>
<td class="bold>Td2</td>
<td>td2</td>
</tr>
</table>
我想要这个数组:
PS:我知道我可以使用正则表达式,但它会非常复杂.我想要一个像 JSoup 这样的工具,它可以自动完成所有工作,而无需编写太多代码
PS: I know I can use regex but it would be extremely complicated. I want a tool like JSoup that can do all the work automatically without much code writing
推荐答案
这就是使用 JSoup 的方法 (srsly, don'不要对 HTML 使用正则表达式).
This is how it could be done using JSoup (srsly, don't use regexp for HTML).
Document doc = Jsoup.parse(html);
Elements tables = doc.select("table");
for (Element table : tables) {
Elements trs = table.select("tr");
String[][] trtd = new String[trs.size()][];
for (int i = 0; i < trs.size(); i++) {
Elements tds = trs.get(i).select("td");
trtd[i] = new String[tds.size()];
for (int j = 0; j < tds.size(); j++) {
trtd[i][j] = tds.get(j).text();
}
}
// trtd now contains the desired array for this table
}
此外,在您的示例中,class
属性值在此处未正确关闭:
Also, the class
attribute value is not closed properly here in your example:
<td class="bold>Td2</td>
应该是
<td class="bold">Td2</td>
这篇关于如何转换 HTML <table>到二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!