在Perl中有什么好的方法来解析HTML和CSS？ [英] What are some good ways to parse HTML and CSS in Perl?

查看：185 发布时间：2017/2/9 16:54:49 html css perl

本文介绍了在Perl中有什么好的方法来解析HTML和CSS？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个项目，其中我的输入文件以前是XML。我现在被要求开始使用嵌入式CSS处理HTML，我想完成这个干净，尽可能少的代码更改。我使用XML :: LibXML来解析XML文件，但现在我们转向使用CSS的HTML，我想我需要移动到别的东西。也就是说，在我把自己的膝盖深入到愚蠢的决定之前，我很可能会后悔，我想问一下：你们在这种任务中使用什么？

I have a project where my input files used to be XML. I'm now being asked to start processing HTML with embedded CSS instead, and I'd like to accomplish this as cleanly and with as few code changes as possible. I was using XML::LibXML to parse the XML files, but now that we're moving to HTML with CSS, I'm thinking I'll need to move to something else. That said, before I dig myself knee deep into silly decisions I'll likely regret, I wanted to ask here: what do you guys use for this kind of task?

旧XML和新HTML输入文件的结构非常相似，两者都保存相同的信息。 HTML使用div代替XML的文本节点，并且在样式标签和属性中保存其样式信息，而不是单独的xml属性。

The structures of the old XML and the new HTML input files are pretty similar, with both holding the same information. The HTML uses divs in place of the XML's text nodes, and holds its style information in style tags and attributes instead of separated xml attributes.

旧XML的示例是：

<text font="TimesNewRoman,BoldItalic" size="11.04" x="59" y="405" w="52"
      h="12" bold="yes" italic="yes" cs="4.6" o_bbox="59,405;52,12"
      o_size="11.04" o_cs="4.6">
Some text
</text>

新HTML的示例是：

<div o="9ka" style="position:absolute;top:145;left:89;x-pdf-top:744;x-pdf-left:60;x-pdf-bottom:732;x-pdf-right:536;">
  <span class="ft19" >
    Some text
  </span></nobr>
</div>

其中ft19是指从页面顶部开始的CSS样式元素，格式为： / p>

where "ft19" refers to a css style element from the top of the page of the format:

.ft19{ vertical-align:top;font-size:14px;x-pdf-font-size:14px;
       font-family:Times;color:#000000;x-pdf-color:#000000;font-style:italic;
       x-pdf-letter-spacing:0.83px;}

基本上，是一个可以读取每个节点的样式元素作为属性的解析器，所以我可以这样做：

Basically, all I want is a parser that can read the stylistic elements of each node as attributes, so I could do something like:

my @texts_arr = $page_node->findnodes('text');
my $test_node = $texts_arr[1];
print "node\'s bold value is: " . $text_node->getAttribute('bold');

。是否存在类似于解析HTML的东西？我真的想确保我开始这个正确的方式，而不是找到一些什么，我想要的CPAN，并实现两个月后，有另一个模块，是更好的我的努力。

as I'm able to do with the XML. Does anything like that exist for parsing HTML? I'd really like to make sure I start this the right way instead of finding something that sort of does what I want on CPAN and realizing two months later that there was another module that was way better for what I'm trying to do.

想法？

在Perl中有什么好的方法来解析HTML和CSS？ [英] What are some good ways to parse HTML and CSS in Perl?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在Perl中有什么好的方法来解析HTML和CSS？ [英] What are some good ways to parse HTML and CSS in Perl?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭