Perl:为什么自JSON 2.xx起我需要显式设置latin1标志? [英] Perl: Why do i need to set the latin1 flag explicitly since JSON 2.xx?

查看:132
本文介绍了Perl:为什么自JSON 2.xx起我需要显式设置latin1标志?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

自JSON 2.xx起,我需要设置latin1标志以使变音符对html文档的安全:

Since JSON 2.xx i need to set the latin1 flag in order to get umlauts safe to the html document:

my $obj_with_umlauts = {
    title  => 'geändert',
}


my $json = JSON->new()->latin1(1)->encode($obj_with_umlauts);

使用JSON 1.xx无需这样做:

This was not necessary using JSON 1.xx :

my $json = JSON->new()->objToJson($obj_with_umlauts);

html文档位于iso-8559-1(元标记)中.

The html document is in iso-8559-1 (meta-tag).

有人可以向我解释为什么吗?

推荐答案

您在说什么?

$ perl -MJSON -E'
   say $JSON::VERSION;
   my $json = JSON->new()->objToJson(["\xE4"]);
   say sprintf "%v02X", $json;
'
1.15
5B.22.E4.22.5D         # Unicode code points for ["ä"]

$ perl -MJSON -E'
   say $JSON::VERSION;
   my $json = JSON->new()->encode(["\xE4"]);
   say sprintf "%v02X", $json;
'
2.59
5B.22.E4.22.5D         # Unicode code points for ["ä"]

这两个字符串是相同的!实际上,添加->latin1()不会做任何更改,因为Unicode代码点U + 00E4的iso-8859-1编码为E4.

Those two strings are identical! In fact, adding ->latin1() doesn't change anything because the iso-8859-1 encoding of Unicode code point U+00E4 is E4.

$ perl -MJSON -E'
   say $JSON::VERSION;
   my $json = JSON->new()->latin1()->encode(["\xE4"]);
   say sprintf "%v02X", $json;
'
2.59
5B.22.E4.22.5D         # iso-8859-1 encoding of ["ä"]

后两者之间有一个区别:它以不同的方式存储在标量中.那绝对没有什么区别.如果代码对它们的处理方式不同,则该代码将错误地读取标量中的数据,并且 代码存在错误.

There is one difference between the last two: it's stored differently in the scalar. That should make absolutely no difference. If code treats them differently, then that code is incorrectly reading the data in the scalar, and that code is buggy.

$ string_with_umlauts绝对是winLatin中的字符串

$string_with_umlauts definetly is a string in winLatin

好吧,这是第一错误.

JSON需要解码文本的字符串(Unicode代码点的字符串),而不是编码文本.

JSON expects strings of decoded text (strings of Unicode code points), not encoded text.

也就是说,使用iso-8859-1编码的字符串和Unicode代码点的字符串之间没有区别.例如,当使用iso-8859-1进行编码时,ä"是字节E4,它是Unicode代码点U + 00E4,同一编号使用两种不同的符号表示.

That said, there happens to be no difference between a string encoded using iso-8859-1 and a string of Unicode code points. For example, when encoded using iso-8859-1, "ä" is byte E4, and it's Unicode code point U+00E4, two different notation for the same number.

但是,如果字符串是使用cp1252编码的,则您会遇到字符€,ƒ„…†‡ˆ‰Š‹ŒŽ''••—〜™š›œžŸ(cp1252中的字符,但是不在iso-8859-1中).例如,当使用cp1252进行编码时,€"是字节80,但它是Unicode代码点U + 20AC. 0x80!= 0x20AC.

If the string is encoded using cp1252, though, you'll have problems with characters €‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ (the characters in cp1252 but not in iso-8859-1). For example, when encoded using cp1252, "€" is byte 80, but it's Unicode code point U+20AC. 0x80 != 0x20AC.

html文档位于iso-8559-1(元标记)中.

The html document is in iso-8559-1 (meta-tag).

然后在某个时候,您必须将输出编码为iso-8859-1.您可以使用:encoding层,使用Encode的encode或使用JSON的->latin1指令来完成此操作.使用此最终选项的好处是,它将导致JSON在尝试对其进行编码之前,转义iso-8859-1字符集之​​外的任何字符.

Then at some point, you'll have to encode the output into iso-8859-1. You can do it using an :encoding layer, or using Encode's encode or using JSON's ->latin1 directive. The advantage of using this final option is that it will cause JSON to escape any character outside of the iso-8859-1 character set before attempting to encode it.

有人可以向我解释为什么吗?

Can anybody explain to me why?

您有一个代码(XS模块),该代码读取标量的基础字符串缓冲区,并将其错误地视为字符串的内容.该模块中有一个错误.

You have a code (an XS module) that reads the underlying string buffer of the scalar and incorrectly treats that as the content of the string. There is a bug is in that module.

这篇关于Perl:为什么自JSON 2.xx起我需要显式设置latin1标志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆