无法使用Perl Encode模块对某些字符编码为iso-8859-1编码 [英] Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

查看：132 发布时间：2017/8/17 2:04:08 perl encoding

本文介绍了无法使用Perl Encode模块对某些字符编码为iso-8859-1编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个ISO-8859-1编码的HTML字符串。我需要将此字符串传递给HTML：Entities :: decode_entities（），用于将一些HTML ASCII代码转换为相应的字符。所以我正在使用一个模块HTML :: Parser :: Entities 3.65，但在decode_entities（）操作之后，我的整个字符串更改为utf-8字符串。 HTML :: Parse的文档似乎很好。因为我需要这个字符串回到ISO-8859-1格式进一步处理，所以我已经使用Encode :: encode（iso-8859-1，$ str）将字符串更改为ISO-8859-1编码。
我的结果很好，除了一些字符，一个问号即将到来。一个例子是单引号'ASCII码（’）

I have a HTML string in ISO-8859-1 encoding. I need to pass this string to HTML:Entities::decode_entities() for converting some of the HTML ASCII codes to respective chars. To so i am using a module HTML::Parser::Entities 3.65 but after decode_entities() operation my whole string changes to utf-8 string. This behavior seems fine as the documentation of the HTML::Parse. As i need this string back in ISO-8859-1 format for further processing so i have used Encode::encode("iso-8859-1",$str) to change the string back to ISO-8859-1 encoding. My results are fine excepts for some chars, a question mark is coming instead. One example is single quote ' ASCII code (’)

如果Encode模块有任何限制，有人可以帮我吗？任何其他指针也将有助于解决问题。
我正在粘贴具有引发问题的字符的示例文本：

Can anybody help me if there any limitation of Encode module? Any other pointer will also be helpful to solve the problem. I am pasting the sample text having the char causing the issue:

my $str = "This is a test string to test the encoding of some chars like &rsquo; &ldquo; &rdquo; etc these are failing to encode; some of them which encode correctly are &eacute; &laquo; etc.";

谢谢

推荐答案

根本的问题是由& rsquo; ，& ldquo; 和不存在于 ISO-8859-1 。你必须决定你想要做什么。

The fundamental problem is that the characters represented by ’, “, and ” do not exist in ISO-8859-1. You'll have to decide what it is that you want to do with them.

有些可能性：

使用Microsoft的扩展版本的ISO-8859-1的 cp1252 ，而不是真实的东西它包含这些字符。

Use cp1252, Microsoft's "extended" version of ISO-8859-1, instead of the real thing. It does include those characters.

重新编码ISO-8859-1范围之外的实体（加& ），在从utf-8转换为ISO-8859-1之前：

Re-encode the entities outside the ISO-8859-1 range (plus &), before converting from utf-8 to ISO-8859-1:

my $toEncode = do { no warnings 'utf8'; "&\x{0100}-\x{10FFFF}" };
$string = HTML::Entities::encode_entities($string, $toEncode);

（无警告位是需要的，因为U + 10FFFF尚未实际分配。）

(The no warnings bit is needed because U+10FFFF hasn't actually been assigned yet.)

还有其他可能性。这真的取决于你要完成的工作。

There are other possibilities. It really depends on what you're trying to accomplish.

这篇关于无法使用Perl Encode模块对某些字符编码为iso-8859-1编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法使用Perl Encode模块对某些字符编码为iso-8859-1编码 [英] Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

无法使用Perl Encode模块对某些字符编码为iso-8859-1编码 [英] Unable to encode to iso-8859-1 encoding for some chars using Perl Encode module

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭