TreeBuilder获取嵌入式节点 [英] TreeBuilder Get embedded nodes
问题描述
< thead> ;
< tr>
< th scope =colclass =rgHeaderstyle =text-align:center;> Name< th>< th scope =colclass =rgHeaderstyle = text-align:center;>电子邮件地址< / th>< th scope =colclass =rgHeaderstyle =text-align:center;> School Phone< / th>
< / tr>
< / thead>< tbody>
< tr class =rgRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__0>
< td>
Michael Bowen
< / td>< td> mbowen@cpcisd.net< / td>< td> 903-488-3671 ext3200< / td>
< / tr>< tr class =rgAltRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__1>
< td>
Christian Calixto
< / td>< td> calixtoc@cpcisd.net< / td>< td> 903-488-3671 x 3430< / td>
< / tr>< tr class =rgRowid =ctl00_ContentPlaceHolder1_rg_People_ctl00__2>
< td>
Rachel Claxton
< / td>< td> claxtonr@cpcisd.net< td>< td> 903-488-3671 x 3450< / td>
< / tr>
< / tbody>
< / table>< input id =ctl00_ContentPlaceHolder1_rg_People_ClientStatename =ctl00_ContentPlaceHolder1_rg_People_ClientStatetype =hiddenautocomplete =off> < / DIV>
< br>
我知道如何在节点等处使用treebuilder,并且我在一些我的脚本。
my($ file)= @_;
my $ html = path($ file) - >啜;
my $ tree = HTML :: TreeBuilder-> new_from_content($ html);
my @nodes = $ tree-> look_down(_tag =>'input');
my $ val;
foreach my $ node(@nodes){
$ val = $ node-> look_down('name',qr / \ $ txt_Website /) - > attr('value');
}
return $ val;
我打算为这个函数使用相同的代码,但是我意识到我没有因为< td>
标签在脚本中的很多其他地方。我相信有更好的方法来处理这个问题,但我似乎无法找到它。
链接到HTML代码: http://pastebin.com/qLwu80ZW
我的代码: https://pastebin.com/wGb0eXmM
注意:我确实尽可能在谷歌上查看,但我不太确定我应该搜索什么。
包含所需数据的表
元素具有唯一的类 rgMasterTable
所以你可以在 look_down
中搜索。它直接从你的pastebin中提取HTML。
use strict;
使用警告'all';
使用LWP :: Simple'get';
使用HTML :: TreeBuilder;
使用常量URL => http://pastebin.com/raw/qLwu80ZW;
my $ tree = HTML :: TreeBuilder-> new_from_content(get URL);
my($ table)= $ tree-> look_down(_tag =>'table',class =>'rgMasterTable');
for my $ tr($ table-> look_down(_tag =>'tr')){
next,除非我的@td = $ tr-> look_down (_tag =>'td');
my($ name,$ email)= map {$ _-> as_trimmed_text} @td [0,1];
printf%-17s%s \\\
,$ name,$ email;
}
输出
Michael Bowen mbowen@cpcisd.net
Christian Calixto calixtoc@cpcisd.net
Rachel Claxton claxtonr@cpcisd.net
Basically, I need to get the names and emails from all of these people in the HTML code.
<thead>
<tr>
<th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th>
</tr>
</thead><tbody>
<tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0">
<td>
Michael Bowen
</td><td>mbowen@cpcisd.net</td><td>903-488-3671 ext3200</td>
</tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1">
<td>
Christian Calixto
</td><td>calixtoc@cpcisd.net</td><td>903-488-3671 x 3430</td>
</tr><tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__2">
<td>
Rachel Claxton
</td><td>claxtonr@cpcisd.net</td><td>903-488-3671 x 3450</td>
</tr>
</tbody>
</table><input id="ctl00_ContentPlaceHolder1_rg_People_ClientState" name="ctl00_ContentPlaceHolder1_rg_People_ClientState" type="hidden" autocomplete="off"> </div>
<br>
I know how to use treebuilder with the nodes and such, and I'm using this code in some of my script.
my ($file) = @_;
my $html = path($file)-> slurp;
my $tree = HTML::TreeBuilder->new_from_content($html);
my @nodes = $tree->look_down(_tag => 'input');
my $val;
foreach my $node (@nodes) {
$val = $node->look_down('name', qr/\$txt_Website/)->attr('value');
}
return $val;
I was going to use the same code for this function, but I realized that I don't have much to search for, since the <td>
tag is in so many other places in the script. I'm sure there's a better way to approach this problem, but I can't seem to find it.
LINK TO HTML CODE: http://pastebin.com/qLwu80ZW
MY CODE: https://pastebin.com/wGb0eXmM
Note: I did look on google as much as possible, but I'm not quite sure what I should search for.
The table
element that encloses the data you need has a unique class rgMasterTable
so you can search for that in look_down
I've written this to demonstrate. It pulls the HTML directly from your pastebin
use strict;
use warnings 'all';
use LWP::Simple 'get';
use HTML::TreeBuilder;
use constant URL => 'http://pastebin.com/raw/qLwu80ZW';
my $tree = HTML::TreeBuilder->new_from_content(get URL);
my ($table) = $tree->look_down(_tag => 'table', class => 'rgMasterTable');
for my $tr ( $table->look_down(_tag => 'tr') ) {
next unless my @td = $tr->look_down(_tag => 'td');
my ($name, $email) = map { $_->as_trimmed_text } @td[0,1];
printf "%-17s %s\n", $name, $email;
}
output
Michael Bowen mbowen@cpcisd.net
Christian Calixto calixtoc@cpcisd.net
Rachel Claxton claxtonr@cpcisd.net
这篇关于TreeBuilder获取嵌入式节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!