如何在 Perl 中替换所有 HTML 编码的重音符号? [英] How can I replace all the HTML-encoded accents in Perl?

查看:47
本文介绍了如何在 Perl 中替换所有 HTML 编码的重音符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下情况:

有一种工具可以从 Web 界面获取 XSLT 并将 XSLT 嵌入到 XML 文件中(应该有人被解雇).不幸的是"我在法语国家工作,因此 XSLT 有许多带有口音的单词.当 XSLT 嵌入到 XML 中时,该工具会将所有重音符号转换为其 HTML 代码(Iacute、igrave 等...).

我的 Perl 代码正在从 XML 中检索 XSLT,并使用 Xalan 命令行工具针对其他 XML 执行它.每次 XSLT 中出现重音时,Xalan 工具都会抛出异常.

我最初虽然做一个正则表达式来改变 XSLT 中的所有重音符号:

<前># & 在代码中被省略,因为它将在页面中呈现$xslt =~s/Aacute;/Á/gso;$xslt =~s/aacute;/á/gso;$xslt =~s/Agrave;/À/gso;$xslt =~s/Acirc;/Â/gso;$xslt =~s/agrave;/à/gso;

但是这样做意味着我必须为每个重音代码编写一个正则表达式......

我的问题是,是否可以在不为每个代码编写正则表达式的情况下执行此操作?(认为​​这是唯一的解决方案,让人想呕吐.)

顺便说一下,这个工具是 TeamSite,它很糟糕......

我忘了提到我需要一个仅限 Perl 的解决方案,安全性不允许我安装他们一周左右没有检查过的任何类型的库:(

解决方案

你可以试试像 HTML::实体.来自 POD:

使用 HTML::Entities;$a = "V&aring;re norske tegn b&oslash;r &#230res";decode_entities($a);#encode_entities($a, "\200-\377");## 不需要你在做什么

响应您的编辑,HTML::Entities 不在 perl 核心中.它可能仍安装在您的系统上,因为许多其他库都在使用它.您可以通过运行以下命令进行检查:

perl -MHTML::Entities -le 'print "如果打印出来,说明它已安装"'

I have the following situation:

There is a tool that gets an XSLT from a web interface and embeds the XSLT in an XML file (Someone should have been fired). "Unfortunately" I work in a French speaking country and therefore the XSLT has a number of words with accents. When the XSLT is embedded in the XML, the tool converts all the accents to their HTML codes (Iacute, igrave, etc...) .

My Perl code is retrieving the XSLT from the XML and is executing it against an other XML using Xalan command line tool. Every time there is some accent in the XSLT the Xalan tool throws an exception.

I initially though to do a regexp to change all the accents in the XSLT usch as:

# the & is omitted in the codes becuase it will be rendered in the page
$xslt =~s/Aacute;/Á/gso;
$xslt =~s/aacute;/á/gso;
$xslt =~s/Agrave;/À/gso;
$xslt =~s/Acirc;/Â/gso;
$xslt =~s/agrave;/à/gso;

but doing so means that I have to write a regexp for each of the accent codes....

My question is, is there anyway to do this without writing a regexp per code? (thinking that is the only solution makes be want to vomit.)

By the way the tool is TeamSite, and it sucks.....

Edited: I forgot to mention that I need to have a Perl only solution, security does not let me install any type of libs they have not checked for a week or so :(

解决方案

You can try something like HTML::Entities. From the POD:

use HTML::Entities;
$a = "V&aring;re norske tegn b&oslash;r &#230res";
decode_entities($a);
#encode_entities($a, "\200-\377");  ## not needed for what you are doing

In response to your edit, HTML::Entities is not in the perl core. It might still be installed on your system because it is used by a lot of other libraries. You can check by running this command:

perl -MHTML::Entities -le 'print "If this prints, the it is installed"'

这篇关于如何在 Perl 中替换所有 HTML 编码的重音符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆