如何在命令行中渲染UTF-16BE? [英] How can I render UTF-16BE in command line?

查看:114
本文介绍了如何在命令行中渲染UTF-16BE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常遇到代表UTF-16BE的字符串,例如 \u0444\u0430\u0439\u043b ,可以正确地将其表示为файл

I often come across a string representing UTF-16BE, such as \u0444\u0430\u0439\u043b, which would be properly rendered as файл.

我想知道:是否有一种简单的方法可以在UTF-16BE中呈现文本文件(或简单地

I wonder: is there a simple way to "render" a text file in UTF-16BE (or simply an input string in in UTF-16BE) such as the one above by using sed or other command line tool?

请参见此相关问题

推荐答案

假定文本实际上是使用UTF-16BE编码的(而不是如您在问题中所显示的那样,而是包含反斜杠和'u'个字符),则可以使用 iconv 命令。

Assuming the text is actually encoded in UTF-16BE (and not, as you show in your question, as an ASCII string containing backslash and 'u' characters), you can use the iconv command.

假定您的语言环境设置为处理UTF- 8输出:

Assuming your locale is set to handle UTF-8 output:

iconv -f utf-16be -t utf-8 [input-file]

编辑:

根据您的评论,您所拥有的根本不是UTF-16BE;它显然是纯ASCII,使用 \u .... 语法编码Unicode代码点。 (据我所知,)这不是 iconv 可以识别的格式。

Based on your comments, what you have is not UTF-16BE at all; it's apparently plain ASCII, encoding Unicode code points using the \u.... syntax. This is not a format that iconv recognizes (as far as I know).

您应该编辑问题,删除对UTF-16BE的所有引用,并更准确地解释您实际拥有的数据以及您要使用的数据。这些弦从何而来?它们是存储在文本文件中,还是来自其他来源(例如,某些程序的输出)?输入是否完全由 \u .... 组成?还是与其他数据混合?而且您的语言环境设置是否配置为正确显示UTF-8?

You should edit your question, removing any references to UTF-16BE and explaining more accurately what data you actually have, and what you want to do with it. Where did these strings come from? Are they stored in a text file, or did they come from some other source (say, the output of some program)? Does the input consist entirely of \u...., or is it mixed with other data? And are your locale settings configured to display UTF-8 properly?

如果您的字符串包含 \u0444\u0430\u0439 \u043b (24个ASCII字符),那么 printf 命令应该可以工作-如果您使用 printf 的最新版本。

If you have a string containing "\u0444\u0430\u0439\u043b" (that's 24 ASCII characters), then the printf command should work -- if you use a sufficiently recent version of printf.

printf 都是一个内置的shell和一个外部命令 / usr / bin / printf ,它是GNU coreutils软件包的一部分。

printf is both a shell built-in and an external command, /usr/bin/printf, part of the GNU coreutils package.

以下在我的系统上有效:

The following works on my system:

$ s='\u0444\u0430\u0439\u043b'
$ printf "$s\n"
файл

或者您可以使用%b 格式(这特定于 printf 命令; C的 printf() 函数不会执行此操作),它会解释参数字符串中的反斜杠转义符(通常仅以格式字符串来解释它们):

Or you can use the %b format (this is specific to the printf command; C's printf() function doesn't do this), which interprets backslash escapes in argument strings (normally they're only interpreted in the format string):

$ printf "%b\n" "$s"
файл

在使用较旧版本bash的另一个系统上,内置的 printf 无法识别 \u 转义-但是 / usr / bin / printf 可以。似乎coreutils printf 命令比bash更早地获得了对 \u 转义的支持。

On another system, with an older version of bash, the printf builtin doesn't recognize \u escapes -- but /usr/bin/printf does. It appears that the coreutils printf command gained support for \u escapes earlier than bash did.

$ s='\u0444\u0430\u0439\u043b'
$ printf "$s\n"
\u0444\u0430\u0439\u043b
$ printf "%b\n" "$s"
\u0444\u0430\u0439\u043b
$ /usr/bin/printf "$s\n"
файл
$ /usr/bin/printf "%b\n" "$s"
файл

所有这些都假定您具有'\u0444\u0430\u0439 043u043b'变量中的字符串。如果文件中有文件,您可以可以将文件内容包含到shell变量中,一次只能一行,但这不是最佳解决方案。在这种情况下,该Perl脚本应该可以完成工作;它将输入复制到stdout,用相应的以UTF-8编码的Unicode字符替换 \u .... 序列;输入可以是在命令行上命名的一个或多个文件,也可以是标准输入(如果不带参数调用的话)。

All of this assumes you have the '\u0444\u0430\u0439\u043b' string in a variable. If it's in a file, you could slurp the file contents into a shell variable, probably a line at a time, but it's not the best solution. In that case, this Perl script should do the job; it copies its input to stdout, replacing \u.... sequences with the corresponding Unicode character, encoded in UTF-8; the input can be either one or more files named on the command line, or standard input if it's invoked with no arguments.

#!/usr/bin/perl

use strict;
use warnings;

use utf8;
binmode(STDOUT, ":utf8");

while (<>) {
    s/\\u([\da-fA-F]{4})/chr(hex($1))/eg;
    print;
}

再次,编辑您的问题,使其反映出来您的实际问题,并删除对UTF-16BE的任何引用。

Again, please edit your question so it reflects your actual problem and drops any references to UTF-16BE.

这篇关于如何在命令行中渲染UTF-16BE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆