我如何解码HTML实体? [英] How can I decode HTML entities?
问题描述
这是一个快速的Perl问题:
如何转换HTML特殊字符,如& uuml;
或'
转换成普通的ASCII文本?
我是这样开始的:
s / \&#(\d +); / chr($ 1)/ eg;
可以写入所有HTML字符,但是像这样的函数可能已经存在了?
请注意,我不需要完整的HTML->文本转换器。我已经使用 HTML :: Parser
解析了HTML。我只需要用我得到的特殊字符转换文本。
http://search.cpan.org/perldoc?HTML::Entitiesrel =noreferrer> HTML :: Entities :
使用HTML :: Entities;
my $ html =史努比和查理布朗;
打印decode_entities($ html),\\\
;
您可以猜测输出。
Here's a quick Perl question:
How can I convert HTML special characters like ü
or '
to normal ASCII text?
I started with something like this:
s/\&#(\d+);/chr($1)/eg;
and could write it for all HTML characters, but some function like this probably already exists?
Note that I don't need a full HTML->Text converter. I already parse the HTML with the HTML::Parser
. I just need to convert the text with the special chars I'm getting.
Take a look at HTML::Entities:
use HTML::Entities;
my $html = "Snoopy & Charlie Brown";
print decode_entities($html), "\n";
You can guess the output.
这篇关于我如何解码HTML实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!