为什么sed失败与国际字符和如何解决？ [英] Why does sed fail with International characters and how to fix?

查看：205 发布时间：2016/11/18 15:33:25 linux internationalization sed character

本文介绍了为什么sed失败与国际字符和如何解决？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

GNU sed版本4.1.5似乎失败了国际字符。这是我的输入文件：

Gras Och Stenar Trad - 从Moja到Minneapolis DVD [G2007DVD] 7812 | X

Gras Och Stenar Trad - 从Möja到Minneapolis DVD [G2007DVD] 7812 | Y

（请注意第二行的变音符号）

$ b

sed's /.* | //'<在

我希望只看到X和Y，因为我要求删除所有字符到'|'和空格。相反，我得到：

X

Gras Och Stenar Trad - 从M？ Y

我知道我可以使用tr删除国际字符。首先，但是有办法只使用sed？

解决方案

我认为如果文件的输入编码是不同于您的环境的首选编码。

示例：在中是UTF-8

  $ LANG = de_DE.UTF-8 sed's /.* | //'< in 
 X 
 Y 
 $ LANG = de_DE.iso88591 sed's /.* | //'< in 
 X 
 Y

UTF-8可以安全地解释为ISO- 8859-1，你会得到奇怪的字符，但除了一切都很好。

例如：在 ISO-8859-1

  $ LANG = de_DE.UTF-8 sed's /.* | //'< in 
 X 
 Gras Och Stenar Trad  - 从Möy
 $ LANG = de_DE.iso88591 sed's /.* | //'< in 
 X 
 Y

ISO-8859-1不能解释为UTF -8，解码输入文件失败。奇怪的匹配可能是由于sed尝试恢复而不是完全失败。

答案是基于Debian Lenny / Sid和sed 4.1.5。 / p>

GNU sed version 4.1.5 seems to fail with International chars. Here is my input file:

Gras Och Stenar Trad - From Moja to Minneapolis DVD [G2007DVD] 7812 | X
Gras Och Stenar Trad - From Möja to Minneapolis DVD [G2007DVD] 7812 | Y

(Note the umlaut in second line.)

And when I do

sed 's/.*| //' < in

I would expect to see only the X and Y, as I've asked to remove ALL chars up to the '|' and space beyond it. Instead, I get:

X
Gras Och Stenar Trad - From M? Y

I know I can use tr to remove the International chars. first, but is there a way to just use sed?

解决方案

I think the error occurs if the input encoding of the file is different from the preferred encoding of your environment.

Example: in is UTF-8

$ LANG=de_DE.UTF-8 sed 's/.*| //' < in
X
Y
$ LANG=de_DE.iso88591 sed 's/.*| //' < in
X 
Y

UTF-8 can safely be interpreted as ISO-8859-1, you'll get strange characters but apart from that everything is fine.

Example: in is ISO-8859-1

$ LANG=de_DE.UTF-8 sed 's/.*| //' < in
X
Gras Och Stenar Trad - From MöY
$ LANG=de_DE.iso88591 sed 's/.*| //' < in
X 
Y

ISO-8859-1 cannot be interpreted as UTF-8, decoding the input file fails. The strange match is probably due to the fact that sed tries to recover rather than fail completely.

The answer is based on Debian Lenny/Sid and sed 4.1.5.

这篇关于为什么sed失败与国际字符和如何解决？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么sed失败与国际字符和如何解决？ [英] Why does sed fail with International characters and how to fix?

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

为什么sed失败与国际字符和如何解决？ [英] Why does sed fail with International characters and how to fix?

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭