如何在Windows命令行中使用Unicode字符? [英] How to use unicode characters in Windows command line?

查看:386
本文介绍了如何在Windows命令行中使用Unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在Team Foundation Server(TFS)中有一个项目,其中包含非英语字符(š)。当尝试编写一些与构建相关的东西的脚本时,我们偶然发现了一个问题-我们无法将š字母传递给命令行工具。命令提示符或其他原因将其弄乱了, tf.exe 实用程序找不到指定的项目。



我已经还尝试了.bat文件的不同格式(ANSI,带有和不带有 BOM 的UTF-8)用JavaScript(本身就是Unicode)编写脚本-但没有运气。如何执行程序并向其传递 Unicode 命令行?

解决方案

我的背景:多年来在控制台中使用Unicode输入/输出(并且每天都要做很多次。此外,我正是为此任务开发了支持工具)。就您所了解的以下事实/限制而言,问题很少:




  • CMD 和控制台是不相关的因素。 CMD.exe 只是准备在控制台内工作的程序之一(控制台应用程序)。

  • AFAIK, CMD 完美支持Unicode;您可以在 any 代码页处于活动状态时输入/输出所有Unicode字符。

  • Windows控制台对Unicode有很多支持,但它并不完美(仅足够好;请参见下文)。

  • chcp 65001 非常危险。除非专门设计一个程序来解决Windows API中的缺陷(或使用具有这些解决方法的C运行时库),否则该程序将无法可靠运行。 Win8使用 cp65001 修复了其中的½个问题,但其余部分仍然适用于Win10

  • 我在 cp1252 工作。正如我已经说过的:要在控制台中输入/输出Unicode,不需要设置代码页



详细信息




  • 要将Unicode读/写到控制台,应用程序(或其C运行时库)应该足够聪明,不要使用 File-I / O API,而是使用 Console-I / O API。 (例如,请参见 Python的工作方式。)

  • 同样,要读取Unicode命令行参数,应用程序(或其C运行时库)应该足够聪明以使用相应的API。

  • 控制台字体渲染仅支持BMP中的Unicode字符(换句话说: U + 10000 以下)。仅支持简单的文本呈现(因此,只要使用预设的形式,欧洲语言(和某些东亚语言)应该可以正常工作)。 [这里有小字样用于东亚和字符U + 0000,U + 0001,U + 30FB。 ]



实际注意事项




  • Window上的默认值不是很有帮助。为了获得最佳体验,应该调整3种配置:




    • 输出:一种全面的控制台字体。为了获得最佳效果,我建议我的构建。 (安装说明在此处-并在此页的其他答案中也列出。)

    • 输入:功能强大的键盘布局。为了获得最佳效果,我建议我的布局

    • 输入:< a href = https://www.google.com/search?num=100&hl=zh_CN&pws=0&q=enable-hex-unicode-entry+windows+registry rel = noreferrer>允许十六进制输入Unicode 。


  • 另一个带有粘贴到控制台应用程序中的陷阱(非常技术性):




    • 十六进制输入在 Alt的 KeyUp 上提供字符; all 其他传送角色的方式发生在 KeyDown 上;如此多的应用程序还无法在 KeyUp 上看到字符。 (仅适用于使用 Console-I / O API的应用程序。)

    • 结论:许多应用程序不会对HEX输入事件做出反应。

    • 此外,粘贴字符会发生什么情况取决于当前的键盘布局:是否可以在不使用前缀键的情况下键入字符(但可以使用任意复杂的修饰符组合,如 Ctrl-Alt-AltGr-假名-Shift-灰色* ),然后将其交付给模拟按键。这就是任何应用程序所期望的— —因此,粘贴仅包含此类字符的任何内容都可以。

    • 但是,其他字符是通过模拟十六进制输入来传递的。



    结论 :除非您的键盘布局支持输入很多字符没有前缀键,当您通过控制台的用户界面粘贴 Alt-Space EP 。 (是我建议使用键盘布局的原因!)




一个人也应该保留请注意,Windows 完全不是控制台。它们不支持 Console-I / O API,因此依赖这些API起作用的程序将无法运行。 (不过,仅使用控制台文件句柄的File-I / O API的程序可以正常工作。)



一个这样的非控制台示例是一部分MicroSoft的 Powershell 。我不用这个;要进行实验,请按并释放 WinKey ,然后键入 powershell






(另一方面,有诸如 ConEmu ANSICON 尝试做更多的事情:他们尝试拦截 Console-I / O API以使真正的控制台应用程序正常工作

摘要

$ b $确实适用于玩具示例程序;在现实生活中,这可能会或可能不会解决您的特殊问题。 b

  • 设置字体,键盘布局(以及可选的允许十六进制输入)。


  • 仅使用通过 Console-I / O API并接受Unicode命令行参数的程序。例如,任何 cygwin 编译的程序都可以。正如我已经说过的, CMD 也可以。




UPD:最初,对于 cp65001 中的错误,我混合使用了内核和CRTL层(UPD²:和Windows用户模式API!)。 也: Win8修复了该错误的一半;我澄清了有关更好的控制台应用程序的部分,并添加了有关Python如何实现的参考。


We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.

I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?

解决方案

My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:

  • CMD and "console" are unrelated factors. CMD.exe is a just one of programs which are ready to "work inside" a console ("console applications").
  • AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active.
  • Windows’ console has A LOT of support for Unicode — but it is not perfect (just "good enough"; see below).
  • chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems with cp65001, but the rest is still applicable to Win10.
  • I work in cp1252. As I already said: To input/output Unicode in a console, one does not need to set the codepage.

The details

  • To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API. (For an example, see how Python does it.)
  • Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API.
  • Console font rendering supports only Unicode characters in BMP (in other words: below U+10000). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]

Practical considerations

  • The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration:

    • For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.)
    • For input: a capable keyboard layout. For best results, I recommend my layouts.
    • For input: allow HEX input of Unicode.
  • One more gotcha with "Pasting" into a console application (very technical):

    • HEX input delivers a character on KeyUp of Alt; all the other ways to deliver a character happen on KeyDown; so many applications are not ready to see a character on KeyUp. (Only applicable to applications using Console-I/O API.)
    • Conclusion: many application would not react on HEX input events.
    • Moreover, what happens with a "Pasted" character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray*) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine.
    • However, the "other" characters are delivered by emulating HEX input.

    Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console’s UI: Alt-Space E P. (This is why I recommend using my keyboard layouts!)

One should also keep in mind that the "alternative, ‘more capable’ consoles" for Windows are not consoles at all. They do not support Console-I/O APIs, so the programs which rely on these APIs to work would not function. (The programs which use only "File-I/O APIs to the console filehandles" would work fine, though.)

One example of such non-console is a part of MicroSoft’s Powershell. I do not use it; to experiment, press and release WinKey, then type powershell.


(On the other hand, there are programs such as ConEmu or ANSICON which try to do more: they "attempt" to intercept Console-I/O APIs to make "true console applications" work too. This definitely works for toy example programs; in real life, this may or may not solve your particular problems. Experiment.)

Summary

  • set font, keyboard layout (and optionally, allow HEX input).

  • use only programs which go through Console-I/O APIs, and accept Unicode command-line arguments. For example, any cygwin-compiled program should be fine. As I already said, CMD is fine too.

UPD: Initially, for a bug in cp65001, I was mixing up Kernel and CRTL layers (UPD²: and Windows user-mode API!). Also: Win8 fixes one half of this bug; I clarified the section about "better console" application, and added a reference to how Python does it.

这篇关于如何在Windows命令行中使用Unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆