我如何从Perl中的HTML表格中提取数据? [英] How can I extract data from HTML tables in Perl?
问题描述
可能的重复:
你能否提供一个用你喜欢的解析器解析HTML的例子?
如何从HTML文件中使用Perl提取内容?
我试图在Perl中使用正则表达式来解析具有以下结构的表。第一行如下:
< tr class =Highlight>< td>播放时间< / a> ;< / TD>< TD>< / TD>< TD>艺术家< / TD>< TD宽度= 1% >< / TD>< TD>名称< / TD>< ; TD>标签和LT; / TD>< / TR>
在这里,我希望取出时间播放,艺术家,标题和标签,然后将它们打印到输出文件中。
任何帮助都会大大降低!
对不起...我已经尝试了许多正则表达式,例如:
$ lines =〜/(< td>)/
或
$ lines =〜/< td>(。*)< /
或
$ lines =〜/>(。*)< /
我的当前程序如下所示:
#!perl -w
打开INPUT_FILE,<,FIRST_LINE_OF_OUTPUT.txt或死掉$ !;
打开OUTPUT_FILE,>>,PLAYLIST_TABLE.txt或死亡$ !;
my $ lines = join'',< INPUT_FILE> ;;
打印Hello 2 \\\
;
if($ lines =〜/(\S。* \S)/){
printthis is 1:\\\
;
打印$ 1;
if($ lines =〜/< td>(。*)< /){
printthis is the 2nd 1:\\\
;
打印$ 1;
打印字样:$ 1. \\\
;
$时间= $ 1;
print $ Time;
打印OUTPUT_FILE $ Time;
} else {
print2ND IF FAILED \\\
;
}
} else {
printTHIS FAILED \\\
;
}
close(INPUT_FILE);
close(OUTPUT_FILE);
不要使用正则表达式来解析HTML。有很多CPAN模块可以更有效地为您做到这一点。
- 你可以提供一些例子说明为什么很难用正则表达式来解析XML和HTML吗? /stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser\">你能否提供一个用你喜欢的解析器解析HTML的例子?
- HTML :: Parser
- HTML :: TreeBuilder li>
- HTML :: TableExtract
Possible duplicate:
Can you provide an example of parsing HTML with your favorite parser?
How can I extract content from HTML files using Perl?
I'm trying to use regular expressions in Perl to parse a table with the following structure. The first line is as follows:
<tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</td><td width="1%"></td><td>Title</td><td>Label</td></tr>
Here I wish to take out "Time Played", "Artist", "Title", and "Label", and print them to an output file.
Any help would be greatly apreciated!
Ok sorry... I've tried many regular expressions such as:
$lines =~ / (<td>) /
OR
$lines =~ / <td>(.*)< /
OR
$lines =~ / >(.*)< /
My current program looks like so:
#!perl -w
open INPUT_FILE, "<", "FIRST_LINE_OF_OUTPUT.txt" or die $!;
open OUTPUT_FILE, ">>", "PLAYLIST_TABLE.txt" or die $!;
my $lines = join '', <INPUT_FILE>;
print "Hello 2\n";
if ($lines =~ / (\S.*\S) /) {
print "this is 1: \n";
print $1;
if ($lines =~ / <td>(.*)< / ) {
print "this is the 2nd 1: \n";
print $1;
print "the word was: $1.\n";
$Time = $1;
print $Time;
print OUTPUT_FILE $Time;
} else {
print "2ND IF FAILED\n";
}
} else {
print "THIS FAILED\n";
}
close(INPUT_FILE);
close(OUTPUT_FILE);
Do NOT use regexps to parse HTML. There are a very large number of CPAN modules which do this for you much more effectively.
- Can you provide some examples of why it is hard to parse XML and HTML with a regex?
- Can you provide an example of parsing HTML with your favorite parser?
- HTML::Parser
- HTML::TreeBuilder
- HTML::TableExtract
这篇关于我如何从Perl中的HTML表格中提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!