Google App脚本从杂乱的HTML中解析表格 [英] Google App Script parse table from messed html
本文介绍了Google App脚本从杂乱的HTML中解析表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想创建一个下载html的脚本,解析一个表并将其保存到SpreadSheet中。我坚持下载和解析。
到表的Xpath是:
/ html / body / table / tbody / tr [5] / td / table / tbody / tr / td [2] / table p $ p
$ b
。
function fetchIt(){
var fetchString =http://www.zbranebrymova.com/index。 php?s_lev = 22& type = nabku * signa
var response = UrlFetchApp.fetch(fetchString);
var xmlDoc = Xml.parse(response.getBlob()。getDataAsString(),true);
var b = xmlDoc.getElement()。getElement(body)。getElement(table);
Logger.log(b);
}
解决方案
这将是有益的,这是我的表解析代码片段:
html文件FOO.HTM:
< HTML>
< head> < /头>
< body style =margin-left:10px>
< table title =>
< tbody>
< tr>
< th align =centerabbr =Monday> Mon< / th>
< / tr>
< tr>
< td align =left>< a title =January 01> 1< / a>
< div> Joe,Doe< / div>
< div>墨菲,杰克< / div>
< / td>
< td align =left>< a title =January 02> 2< / a>
< div> Carlson,Carl< / div>
< div> Guy,Girl< / div>
< div>列宁,弗拉基米尔< / div>
< / td>
< / tr>
< / tbody>
< / table>
< / body>
< html>
这就是我解析它的方式:
function foo(){
var page = UrlFetchApp.fetch('foo.htm');
var rows = Xml.parse(page,true).getElement()
.getElement(html)
.getElement(body)
.getElement(table )
.getElement(tbody)
.getElements(tr); (var ii = 0; ii var cols = rows [ii] .getElements(td);
for(var jj = 0; jj var divs = cols [jj] .getElements(div); (var kk = 0; kk< divs.length; kk ++){
var div = divs [kk];
;
$ c
干杯,sean
I want to create a script which download html, parse a table and save it to a SpreadSheet. I am stuck on downloading and parsing.
Xpath to table is:
/html/body/table/tbody/tr[5]/td/table/tbody/tr/td[2]/table
Currently I am stuck at parsing Xpath.
function fetchIt() {
var fetchString="http://www.zbranebrymova.com/index.php?s_lev=22&type=nabku*signa"
var response = UrlFetchApp.fetch(fetchString);
var xmlDoc = Xml.parse(response.getBlob().getDataAsString(),true);
var b = xmlDoc.getElement().getElement("body").getElement("table") ;
Logger.log(b);
}
解决方案 I don't know if this will be helpful, here is a snippet of my table parsing code:
html file FOO.HTM:
<html>
<head> </head>
<body style="margin-left:10px">
<table title="">
<tbody>
<tr>
<th align="center" abbr="Sunday">Sun</th>
<th align="center" abbr="Monday">Mon</th>
</tr>
<tr>
<td align="left"><a title="January 01">1</a>
<div>Joe,Doe</div>
<div>Murphy,Jack</div>
</td>
<td align="left"><a title="January 02">2</a>
<div>Carlson,Carl</div>
<div>Guy,Girl</div>
<div>Lenin,Vladimir</div>
</td>
</tr>
</tbody>
</table>
</body>
<html>
and this is how I parse it:
function foo() {
var page = UrlFetchApp.fetch('foo.htm');
var rows = Xml.parse(page,true).getElement()
.getElement("html")
.getElement("body")
.getElement("table")
.getElement("tbody")
.getElements("tr");
for (var ii = 0; ii < rows.length; ii++) {
var cols = rows[ii].getElements("td");
for (var jj = 0; jj < cols.length; jj++) {
var divs = cols[jj].getElements("div");
for (var kk = 0; kk < divs.length; kk++) {
var div = divs[kk];
}
}
}
}
cheers, sean
这篇关于Google App脚本从杂乱的HTML中解析表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文