多字节字符集中的换行控制字符 [英] Newline control characters in multi-byte character sets

查看:172
本文介绍了多字节字符集中的换行控制字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些Perl代码将新行和换行符转换为规范化形式。
输入文本是日语,因此会有多字节字符。

I have some Perl code that translates new-lines and line-feeds to a normalized form. The input text is Japanese, so that there will be multi-byte characters.

仍然可以逐个字节进行这种转换基础(我认为它目前做),或者我必须检测字符集和启用Unicode支持?换句话说,使用字节作为其字符集的一部分的流行编码(Shift-JIS,EUC-JP,UTF-8,ISO-2022-JP)可能被误认为ASCII控制字符?

Is it still possible to do this transformation on a byte-by-byte basis (which I think it currently does), or do I have to detect the character set and enable Unicode support? In other words, are the popular encodings (Shift-JIS, EUC-JP, UTF-8, ISO-2022-JP) using bytes as part of their character set that could be mistaken for ASCII control characters?

我只需要CR和LF工作。

I need only CR and LF to work.

更新:添加了ISO-2022-这是一个看起来最麻烦的,它的时髦的转义序列...

Update: Added ISO-2022-JP. And that is the one that looks the most troublesome with its funky escape sequences ...

推荐答案

提及(Shift-JIS,UTF-8,EUC-JP,ISO-2022-JP)使用日语字符内的CR或LF字符。对于UTF-8和EUC-JP,在日语字符内的低ASCII字符和字节之间没有重叠。但是,对于Shift-JIS和ISO-2022-JP,存在重叠,但不在您找到CR和LF的范围。

None of the 4 encodings that you mention (Shift-JIS, UTF-8, EUC-JP, ISO-2022-JP) use the CR or LF character inside Japanese characters. For UTF-8 and EUC-JP, there is no overlap whatsoever between low ascii characters and bytes inside Japanese characters. However, for Shift-JIS, and ISO-2022-JP, there is overlap, but not in the range where you find CR and LF.

For ISO-2022-JP,
First-byte range: 0x21 - 0x7E
Second-byte range: 0x21 - 0x7E

在各种字符集之间来回切换的转义序列字符是:

And the escape sequence characters to switch back and forth between various character sets are:

0x1B, 0x28, 0x24, 0x40, 0x42, and 0x4A

For Shift-JIS,
First-byte range: 0x81 - 0x9F, 0xE0 - 0xEF
Second-byte range: 0x40 - 0x7E, 0x80 - 0xFC
Half-width katakana: 0xA1 - 0xDF

同样,CR和LF没有重叠。

Again, there is no overlap with CR and LF.

这篇关于多字节字符集中的换行控制字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆