如何在命令行中渲染UTF-16BE? [英] How can I render UTF-16BE in command line?
问题描述
我经常遇到代表UTF-16BE的字符串,例如 \u0444\u0430\u0439\u043b
,可以正确地将其表示为файл
。
I often come across a string representing UTF-16BE, such as \u0444\u0430\u0439\u043b
, which would be properly rendered as файл
.
我想知道:是否有一种简单的方法可以在UTF-16BE中呈现文本文件(或简单地
I wonder: is there a simple way to "render" a text file in UTF-16BE (or simply an input string in in UTF-16BE) such as the one above by using sed or other command line tool?
请参见此相关问题。
推荐答案
假定文本实际上是使用UTF-16BE编码的(而不是如您在问题中所显示的那样,而是包含反斜杠和'u'$ c的ASCII字符串) $ c>个字符),则可以使用
iconv
命令。
Assuming the text is actually encoded in UTF-16BE (and not, as you show in your question, as an ASCII string containing backslash and 'u'
characters), you can use the iconv
command.
假定您的语言环境设置为处理UTF- 8输出:
Assuming your locale is set to handle UTF-8 output:
iconv -f utf-16be -t utf-8 [input-file]
编辑:
根据您的评论,您所拥有的根本不是UTF-16BE;它显然是纯ASCII,使用 \u ....
语法编码Unicode代码点。 (据我所知,)这不是 iconv
可以识别的格式。
Based on your comments, what you have is not UTF-16BE at all; it's apparently plain ASCII, encoding Unicode code points using the \u....
syntax. This is not a format that iconv
recognizes (as far as I know).
您应该编辑问题,删除对UTF-16BE的所有引用,并更准确地解释您实际拥有的数据以及您要使用的数据。这些弦从何而来?它们是存储在文本文件中,还是来自其他来源(例如,某些程序的输出)?输入是否完全由 \u ....
组成?还是与其他数据混合?而且您的语言环境设置是否配置为正确显示UTF-8?
You should edit your question, removing any references to UTF-16BE and explaining more accurately what data you actually have, and what you want to do with it. Where did these strings come from? Are they stored in a text file, or did they come from some other source (say, the output of some program)? Does the input consist entirely of \u....
, or is it mixed with other data? And are your locale settings configured to display UTF-8 properly?
如果您的字符串包含 \u0444\u0430\u0439 \u043b
(24个ASCII字符),那么 printf
命令应该可以工作-如果您使用 printf
的最新版本。
If you have a string containing "\u0444\u0430\u0439\u043b"
(that's 24 ASCII characters), then the printf
command should work -- if you use a sufficiently recent version of printf
.
printf
都是一个内置的shell和一个外部命令 / usr / bin / printf
,它是GNU coreutils软件包的一部分。
printf
is both a shell built-in and an external command, /usr/bin/printf
, part of the GNU coreutils package.
以下在我的系统上有效:
The following works on my system:
$ s='\u0444\u0430\u0439\u043b'
$ printf "$s\n"
файл
或者您可以使用%b
格式(这特定于 printf
命令; C的 printf()
函数不会执行此操作),它会解释参数字符串中的反斜杠转义符(通常仅以格式字符串来解释它们):
Or you can use the %b
format (this is specific to the printf
command; C's printf()
function doesn't do this), which interprets backslash escapes in argument strings (normally they're only interpreted in the format string):
$ printf "%b\n" "$s"
файл
在使用较旧版本bash的另一个系统上,内置的 printf
无法识别 \u
转义-但是 / usr / bin / printf
可以。似乎coreutils printf
命令比bash更早地获得了对 \u
转义的支持。
On another system, with an older version of bash, the printf
builtin doesn't recognize \u
escapes -- but /usr/bin/printf
does. It appears that the coreutils printf
command gained support for \u
escapes earlier than bash did.
$ s='\u0444\u0430\u0439\u043b'
$ printf "$s\n"
\u0444\u0430\u0439\u043b
$ printf "%b\n" "$s"
\u0444\u0430\u0439\u043b
$ /usr/bin/printf "$s\n"
файл
$ /usr/bin/printf "%b\n" "$s"
файл
所有这些都假定您具有'\u0444\u0430\u0439 043u043b'
变量中的字符串。如果文件中有文件,您可以可以将文件内容包含到shell变量中,一次只能一行,但这不是最佳解决方案。在这种情况下,该Perl脚本应该可以完成工作;它将输入复制到stdout,用相应的以UTF-8编码的Unicode字符替换 \u ....
序列;输入可以是在命令行上命名的一个或多个文件,也可以是标准输入(如果不带参数调用的话)。
All of this assumes you have the '\u0444\u0430\u0439\u043b'
string in a variable. If it's in a file, you could slurp the file contents into a shell variable, probably a line at a time, but it's not the best solution. In that case, this Perl script should do the job; it copies its input to stdout, replacing \u....
sequences with the corresponding Unicode character, encoded in UTF-8; the input can be either one or more files named on the command line, or standard input if it's invoked with no arguments.
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
binmode(STDOUT, ":utf8");
while (<>) {
s/\\u([\da-fA-F]{4})/chr(hex($1))/eg;
print;
}
再次,请编辑您的问题,使其反映出来您的实际问题,并删除对UTF-16BE的任何引用。
Again, please edit your question so it reflects your actual problem and drops any references to UTF-16BE.
这篇关于如何在命令行中渲染UTF-16BE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!