在Win7中,Unicode/UTF-8文本文件:Windows控制台上的乱码(试图显示希伯来语) [英] In Win7, Unicode/ UTF-8 text file: gibberish on Windows console (Trying to display hebrew)

查看:270
本文介绍了在Win7中,Unicode/UTF-8文本文件:Windows控制台上的乱码(试图显示希伯来语)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个宽字符文件(带有希伯来语文本),在记事本中看起来很好(以"UTF-8编码"保存),在记事本++中也可以很好地阅读,当我将其复制并粘贴到MS Word中时看起来还不错也.但是,当我打开"DOS框"(Windows控制台)并进入:键入file.txt"时,它会显示乱码.
是的,我已经在Windows控制台上完成了有关Unicode的所有建议:我打开了控制台使用"cmd/u",将字体更改为Lucida,然后输入:"chcp 65001".

在运行Windows 7的PC和运行Windows XP SP3的另一台PC上,该问题相同.

解决方案

字体Courier New支持希伯来语,可以添加到命令提示符中.默认字体为consolas,lucida,raster,它们都不支持希伯来语.因此,将Courier New添加到命令提示符.

这样做是一个注册表黑客

http ://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/

这是如何安装字体的一个很好的例子,但是我应该删除很多这样的条目,因为由于cmd不支持它们,所以大多数都没有添加到cmd中.

Lucida和Consolas是默认设置.
栅格是默认值,未在此处列出,可能是因为它是TTF
我尝试添加的所有这些元素中,仅添加了3个(cmd支持)
Courier New,DejaVu Sans Mono,Droid Sans Mono

DejaVu Sans Mono和Droid Sans Mono可下载,受cmd支持,可能具有一些良好的unicode支持/字符,但不包括希伯来语

我有

Consolas <-- default
Courier New  <--- added
DejaVu Sans Mono  <-- added
Droid Sans Mono  <-- added
Lucida Console <-- default
Raster Fonts <-- default

常见的希伯来语字体是Miriam和David,但是不能将它们添加到命令提示符中.

为了记录,Babelmap可以列出系统上所有支持希伯来语的字体,例如在babelmap中,单击fonts..font coverage,然后输入05D0(即ale​​ph).我认为所有这些字体都存在于默认的Windows 7安装中

Aharoni, Arial, Courier New, David, FrankRuehl, Gisha, Levenim MT, Lucida Sans Unicode, Microsoft Sans Serif, Miriam, Miriam Fixed, Narkisim, Rod, Segoe WP, Tahoma, Times New Roman

但是,除了Courier New以外,命令提示符中不支持大多数或所有带有希伯来语的字体.实际上,命令提示符不支持大多数字体句号,甚至"times new roman"也不支持(因为"times new roman"不是等宽/固定宽度的,因此这是众多标准之一)支持,其他条件似乎更晦涩.)

现在,您可以在命令提示符下添加并选择"Courier New".

因此,只要选定的字体支持,就可以将unicode字符粘贴到cmd上.

要复制/粘贴,请单击charmap中的复制"按钮

现在在剪贴板中

要将其粘贴到命令提示符中,在win7中,粘贴到命令提示符不是ctrl-v.您右键单击并选择粘贴. (或者在快速编辑模式下,只需右键单击即可)

那是主要的事情.

另外

通常在Windows中,人们可能会使用记事本和字符映射表.但是,人们应该意识到它们的一些局限性.

当您选择的字体支持字符映射时,它会显示前65536个unicode字符,而字符映射将向您显示UTF-16代码.没关系,您仍然可以从字符映射表粘贴到cmd.exe窗口中,但是您应该知道命令在cmd.exe中运行,并且管道不支持utf-16.因此,您可以使用字符映射表,找到一个字符,例如aleph 05d0,但值得在 http://www上查找字符. fileformat.info/info/unicode/char/05d0/index.htm ,看到utf-16代码为05d0,而utf-8代码为d790. xxd命令和文件命令对于查看文件的实际内容并确定文件的类型很有用.

记事本在Unicode或UTF16代码> FF的Unicode字符集中的任何字符方面有一些限制.在某些命令(例如类型")以及管道和重定向方面,cmd有所限制.

如果使用cmd.exe,则确实需要管道才能使用'cos管道很重要..

管道仅限于CHCP命令可以指定的编码.

(请注意,如果CHCP告诉您处于特定的代码页上,例如850,它会告诉您输入编码.如果您运行命令chcp 850,它将更改输入和输出编码.通常它们是相同的.当它们相同时,比较简单.但是,如果您使用其他程序来更改cmd的编码,例如c#编译器有一个可以更改cmd的开关,那么最好使用chcp进行更改,这样您就知道这两种编码都已设置)./p>

有CHCP 1200(UTF-16LE)和1201(UTF-16BE),但都不支持,如果尝试,它将显示无效的代码页(在win7中测试). CHCP不支持UTF-16(不支持UTF16LE或UTF16BE).有CHCP 65001(没有BOM的UTF-8).还有CHCP 862(我提到的编码MS-DOS的老式方式,希伯来语)

type命令和notepad一样都支持UTF16LE(notepad称为Unicode,是UTF-16 LE),但是管道和重定向不支持. type命令还支持CHCP指定/支持的任何代码页.因此,键入支持862或65001.

因此,您可以使用记事本将其另存为UTF8(与BOM表一起),然后四处逛逛以删除BOM表. (这有点矫kill过正.)或者您可以使用记事本,将其另存为Unicode UTF 16LE. notepad ++,支持不带BOM的UTF8.

如果从cmd执行所有操作,则可以使用862或65001.尽管许多文本编辑器可能无法很好地支持862.因此,您可能更喜欢65001.

如果要在记事本中写入任何文件,并且其字符大于UTF16中的字符,则称为\ uFF,并且要在该文件上的cmd.exe中运行命令,则需要执行某些命令(例如,类型命令),如果您不考虑什么支持​​什么,就会遇到问题.

记事本支持带有BOM的UTF-16BE,UTF-16LE和UTF-8.这不好.无需摆弄xxd和sed或其他命令即可删除BOM.如果您的文件带有所谓的unicode字符,则该字符超出常规ascii范围.字符> UTF-16的\ uFF(如字符映射所示为> \ uFF),然后使用Notepad2或notepad ++

Type支持UTF16LE,以及由CHCP设置的任何代码页,例如65001或862.

管道和重定向按照CHCP设置的方式进行.

代码页862很旧,因此代码页65001是不错的选择.

xxd和file对于查看文件的编码方式很有用,如果遇到问题,这将很有帮助.但并非绝对必要.

因此,如果要编写用于CMD的文件,并且该文件具有一些unicode字符,而您则可以使用诸如xxd和sed之类的一些命令来删除BOM表,以及执行此操作的其他命令.在文本编辑器中创建此类文件的最简单方法是使用文本编辑器(如notepad2或notepad ++),该文本编辑器支持不带BOM的UTF8.

如上所述,首先要获取希伯来语显示可能是最重要的事情.接下来的事情是能够在文本编辑器中保存文件,您可以使用例如'类型'.

如果您想从命令提示符下进行复制(如果不是在quickedit模式下),请右键单击,然后选择mark,然后选择它,然后按Enter.并右键单击并选择粘贴.

另一个要点是

显然chcp 65001中存在一些错误,其中某些批处理文件无法运行,并且某些C程序也无法运行. 如何在Windows命令行中使用unicode字符?即使在cmd位于代码页65001中时也看到了c锐编译器崩溃(尽管可能是c锐编译器引起的,也可能是65001引起了)And yes, I've done all the recommendations for Unicode on Windows console: I opened the console using "cmd /u", I changed the font to Lucida, and I've entered: "chcp 65001".

The problem is identical on a PC running Windows 7, and on another PC running Windows XP SP3.

The Font Courier New supports hebrew and can be added to the command prompt. The default fonts are consolas, lucida, raster, none of them support hebrew. So add Courier New to the command prompt.

It's a registry hack to do that

http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-enable-more-fonts-for-the-windows-command-prompt/

http://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/

This is a good example of how to install fonts, but I should remove a lot of these entries, because most of them didn't get added to cmd because cmd didn't support them.

Lucida and Consolas are defaults.
Raster is a default not listed here maybe 'cos it's a TTF
Of all these I tried to add, only 3 added(are supported by cmd)
Courier New, DejaVu Sans Mono, Droid Sans Mono

DejaVu Sans Mono and Droid Sans Mono are downloadable, supported by cmd, might have some good unicode support/characters, but don't include Hebrew

I have

Consolas <-- default
Courier New  <--- added
DejaVu Sans Mono  <-- added
Droid Sans Mono  <-- added
Lucida Console <-- default
Raster Fonts <-- default

Common hebrew fonts are Miriam and David, but they can't be added to the command prompt.

For the record, Babelmap can list all fonts on your system that support hebrew e.g. in babelmap- click fonts..font coverage, then enter 05D0(that's aleph). I think all these fonts exist on a default windows 7 installation

Aharoni, Arial, Courier New, David, FrankRuehl, Gisha, Levenim MT, Lucida Sans Unicode, Microsoft Sans Serif, Miriam, Miriam Fixed, Narkisim, Rod, Segoe WP, Tahoma, Times New Roman

But most or all of those fonts with hebrew aren't supported in the command prompt, except Courier New. In fact most fonts full stop aren't supported in the command prompt, not even "times new roman"(because "times new roman" is not mono-spaced / fixed width, and that's one of a number of criteria for it to be supported, other criteria seem to be more obscure).

So now you can have Courier New added and selected for use in the command prompt.

And so you can paste unicode characters onto cmd provided the selected font supports it.

To copy/paste, click the Copy button in charmap

Now it's in the clipboard

To paste it into the command prompt, in win7 paste into command prompt isn't ctrl-v. You right click and choose paste. (or if in quickedit mode then just rightclick)

That's the main thing.

Additionally

Often in windows one might use notepad and character map.. but one should be aware of some limitations with them.

Character map shows the first 65536 unicode characters when the font you selected supports it, and character map shows you the UTF-16 code. That's ok, you can still paste from character map into a cmd.exe window, but you should know that commands run in cmd.exe and pipes don't support utf-16. So you can use character map, find a character e.g. aleph 05d0, but it's worth looking up the character on http://www.fileformat.info/info/unicode/char/05d0/index.htm and seeing that while the utf-16 code is 05d0, the utf-8 code is d790. The xxd command and file command is useful for seeing the real contents of a file and determining the file's type.

Notepad is a bit limited when it comes to unicode or any character in the unicode character set whose UTF16 code is > FF. And cmd is a bit limited in regard to some commands like 'type', and in regard to pipes and redirection.

If using cmd.exe you really need pipes to work 'cos pipes are important..

Pipes are limited to the encodings that can be specified by the CHCP Command.

(Note that if CHCP tells you you are on a particular codepage, e.g. 850, it's telling you the input encoding. If you run the command chcp 850 it will change both the input and output encodings. Usually they are the same. It's simpler when they are the same. But if you used some other program to change the encoding of cmd eg the c# compiler has a switch that changes it, then it's best to change it with chcp so you know both encodings are set ).

There is a CHCP 1200 (UTF-16LE) and 1201(UTF-16BE) , but neither are supported, if you try it it will say invalid codepage (tested in win7). CHCP doesn't support UTF-16(it doesn't support UTF16LE or UTF16BE). There is CHCP 65001 (That's UTF-8 without BOM). And there is CHCP 862 (the old fashioned way as in MSDOS days way, of encoding Hebrew, that I mentioned)

The type command supports UTF16LE as does notepad(What notepad calls Unicode, is UTF-16 LE), But pipes and redirection don't support that. The type command also supports any codepage specified/supported by CHCP. So type supports 862 or 65001.

So you could use notepad save it as UTF8 (which is with BOM), then fiddle around to remove the BOM. (That's a bit overkill).. Or you could use notepad, save it as Unicode UTF 16LE.. But then you can't sue pipes.. (that's bad).. Easiest thing to do is use a text editor like notepad2 or notepad++, that supports UTF8 without BOM.

Or if doing everything from cmd you could use 862 or 65001. Though many text editors might not give good support of 862. So you might prefer 65001.

If you want to write any file in notepad and it has a character greater than what in UTF16 is referred to as \uFF, and you want to run commands in cmd.exe on that file, then some commands (e.g. the type command), will have problems if you don't take into account what is supported by what.

Notepad supports UTF-16BE, UTF-16LE and UTF-8 with BOM. That's not good. And no need to fiddle around with xxd and sed or other commands to remove the BOM. If you have any file with a so-called unicode character, a character outside of the regular ascii range. A character > UTF-16's \uFF, as shown by character map as being > \uFF, then use Notepad2 or notepad++

Type supports UTF16LE, and any codepage set by CHCP e.g. 65001 or 862.

Pipes and redirection go by whatever is set by CHCP.

Codepage 862 is old so Codepage 65001 is a good way to go.

xxd and file are useful for seeing how a file is encoded which can be helpful if you have issues. But not absolutely necessary.

So if you want to write a file for use in CMD, and it has some unicode characters, while thee are some commands like xxd and sed that could be used to remove a BOM, and other commands to do so. The easiest way to make such a file in a text editor is to use a text editor like notepad2 or notepad++ which supports UTF8 without BOM.

Getting hebrew displaying might be the most important thing to do first, as described above. And the next thing is being able to save files in a text editor that you can display with e.g. 'type'.

And if you ever want to copy from the command prompt, if not in quickedit mode, then right click then choose mark then select it then hit ENTER. And to paste right click and choose paste.

An further additional point is

Apparently there are bugs in chcp 65001 where some batch files won't run and maybe some C programs won't work either. How to use unicode characters in Windows command line? And i've even seen the c sharp compiler crash when cmd is in codepage 65001 (though one may blame the c sharp compiler, one could also blame 65001) Why is csc.exe crashing when I last left the output encoding as UTF8?

Note- an earlier revision of this answer had some command line examples but they were unnecessarily complex. I might at some point add some commands that demonstrate what I have been describing but it's fairly trivial.

这篇关于在Win7中,Unicode/UTF-8文本文件:Windows控制台上的乱码(试图显示希伯来语)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆