表中的Xpath表 [英] Xpath Table Within Table
问题描述
我在用DOMXpath刮擦大量表格的页面时遇到了一个问题。
布局确实很丑陋,这意味着我试图从表格中获取内容表格中的表格。
使用Firebug FirePath我正在为table元素获取以下路径:
html / body / table / tbody / tr [3] / td / table [1] / tbody / tr [2] / td [1] / table [1] / tbody / tr [3] / td [4]
$现在,经过无休止的试验,我发现,对于一个独立的桌子,我需要将 tbody移除。标签以使其正常工作。但这似乎不足以容纳表中的表。
所以我的问题是如何最好地从表格中的表格中获取表格中的内容?
我上传了要在此处抓取的文件: 1
解决方案我遇到了与您同样的问题,报废了一个复杂且格式不正确的html源,我想在其中获取另一个表中一个表中的值。
我采用了这样的方法来关注我想要的零件,例如:
function parse_html(){//得到我选择提取内容的表的特定部分
$ query = $ xpath-> query('// tr [@ data-eventid] / @ data-eventid') ; //获取我想要的表
$ this-> parse_table();
}
函数parse_table(){//
$ query = $ xpath-> query('// tr [@ data-eventid = 405412] / td [@ class = 影响] / span [@title] / @ title'); ... etc //提取表
$ this-> parseEvaluate();的内容。
}
函数parseEvaluate(){
...验证值是否正确
}
只给出想法。
I am having a bit of a problem of scraping a table-heavy page with DOMXpath.
The layout is really ugly, meaning I am trying to get content out of a table within a table within a table. Using Firebug FirePath I am getting for the table element the following path:
html/body/table/tbody/tr[3]/td/table[1]/tbody/tr[2]/td[1]/table[1]/tbody/tr[3]/td[4]
Now, after endless experimenting I found out, that with a stand alone table, I need to remove the "tbody" tag to make it work. But this doesn't seem to be enough for tables within tables. So my question is how do I best get content out of tables within tables within tables?
I uploaded the file which I am trying to scrape here:1
解决方案i have gone through with the same problem as yours scrapping a source of complicated and not well formatted html where i want to get the values in a table inside another tables..
i came with the approach of eyeing the part that i want to get with some series of function like this:
function parse_html() {//gets a specific part of the table i chose to extract the contents $query = $xpath->query('//tr[@data-eventid]/@data-eventid'); //gets the table i want $this->parse_table(); } function parse_table() {// $query = $xpath->query('//tr[@data-eventid="405412"]/td[@class="impact"]/span[@title]/@title');...etc//extracts the content of the table $this->parseEvaluate(); } function parseEvaluate(){ ...verifying values if correct }
just giving the idea..
这篇关于表中的Xpath表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!