HTML :: PullParser随机分割文本元素 [英] HTML::PullParser splits up text element randomly

查看：99 发布时间：2020/5/25 1:16:44 html perl parsing perl-module

本文介绍了HTML :: PullParser随机分割文本元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Perl模块HTML::PullParser.我注意到有时它会随机拆分一个文本元素(据我所知).

I'm using Perl module HTML::PullParser. I noticed that it sometimes splits up a text element (as far as I can tell) randomly.

例如，如果我有一个HTML文件test.html，其内容为

For example, if I have a html file test.html with the content of

<html>
...
<FONT STYLE="font-family:Times New Roman" SIZE="2">THE QUICK BROWN FOX</FONT>
...
</html>

我的perl代码看起来像

And my perl code looks something like

my $html = HTML::PullParser->new(file => 'test.html', text => '"T", text');
while (my $token = $html->get_token) {
    print "$$token[1]\n";
}

然后有时候我回来

THE QUICK BROWN FOX    # correctly parsed

但是其他时候我得到

THE QUICK
 BROWN FOX

，其中text元素被解析为两个单独的标记.但是在其他时候，根据html文件的其他内容，我得到了

where the text element is parsed into two separate tokens. Yet at other times, depending on the other content of the html file, I get

THE QUICK BROWN
 FOX

断裂点不同.这种行为非常烦人.我尽力找出问题所在.看起来它取决于文件的整体(即，如果我删除文件的其余部分以仅保留该元素，那很好).但是，我无法确定文件其余部分的哪一部分导致了此问题.想知道是否有人有类似的经历并且知道如何解决该问题?谢谢！

where the breaking point is different. This behavior is extremely annoying. And I tried my best to isolate the problem. Looks like it is dependent on the entirety of the file (i.e. if I delete the rest of the file to have only that element left, then it is fine). However, I'm not able to identify what part of the rest of the file caused this. Wondering if anyone had similar experience and know how to get around the issue? Thx!!

更新:此错误行为的发生也不依赖于文件中其他位置的html代码的单个部分.我能够在该文本元素之前隔离出html代码的两部分-当同时存在这两个部分时，就会发生此错误.但是，当一个人出现而没有另一个人时，这个问题就消失了……我绝对感到困惑和烦恼.

UPDATE: the occurrence of this errant behavior is also NOT dependent on a single section of html code elsewhere in the file. I was able to isolate two sections of html codes prior to that text element - when both of them are present, this error occurs. But when either one is present without the other, this problem goes away... I'm absolutely confused and annoyed.

HTML :: PullParser随机分割文本元素 [英] HTML::PullParser splits up text element randomly

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

HTML :: PullParser随机分割文本元素 [英] HTML::PullParser splits up text element randomly

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭