如何在 Windows 命令行中使用 unicode 字符? [英] How to use unicode characters in Windows command line?

查看:19
本文介绍了如何在 Windows 命令行中使用 unicode 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在 Team Foundation Server (TFS) 中有一个项目,其中包含一个非英语字符 (š).在尝试编写一些与构建相关的脚本时,我们偶然发现了一个问题 - 我们无法将 š 字母传递给命令行工具.命令提示符或其他什么东西把它搞砸了,tf.exe 实用程序找不到指定的项目.

我为 .bat 文件尝试了不同的格式(ANSI、UTF-8,带和不带 BOM) 以及在 JavaScript(本质上是 Unicode)中编写脚本 - 但没有运气.如何执行程序并向其传递 Unicode 命令行?

解决方案

我的背景:我多年来在控制台中使用 Unicode 输入/输出(并且每天都这样做.此外,我为这个任务开发了支持工具).据您了解以下事实/限制,问题很少:

  • CMD 和控制台"是不相关的因素.CMD.exe 只是准备在控制台(控制台应用程序")内部工作"的程序之一.
  • AFAIK,CMD 完美支持Unicode;当 any 代码页处于活动状态时,您可以输入/输出所有 Unicode 字符.
  • Windows 的控制台非常支持 Unicode — ,但它并不完美(只是足够好";见下文).
  • chcp 65001 非常危险.除非程序专门设计用于解决 Windows API 中的缺陷(或使用具有这些解决方法的 C 运行时库),否则它不会可靠地工作.Win8 使用 cp65001 修复了其中的 1/2 问题,但其余问题仍然适用于 Win10.
  • 我在 cp1252 工作.正如我已经说过的:要在控制台中输入/输出 Unicode,不需要设置代码页.

详情

  • 要将 Unicode 读/写到控制台,应用程序(或其 C 运行时库)应该足够智能,不使用 File-I/O API,而是使用 Console-I/O API.(例如,请参阅Python 是如何做到的.)
  • 同样,要读取 Unicode 命令行参数,应用程序(或其 C 运行时库)应该足够智能以使用相应的 API.
  • 控制台字体渲染仅支持 BMP 中的 Unicode 字符(换句话说:低于 U+10000).仅支持简单的文本呈现(因此欧洲 — 和一些东亚 — 语言应该可以正常工作 — ,只要使用预先组合的形式).[这里有一个小字体,适用于东亚人和字符 U+0000、U+0001、U+30FB.]

实际考虑

  • Window 上的默认值 不是很有帮助.为获得最佳体验,应调整 3 项配置:

  • 粘贴"到控制台应用程序中的另一个问题(技术性很强):

    • HEX 输入在 AltKeyUp 上传递一个字符;所有传递字符的其他方式发生在 KeyDown 上;许多应用程序还没有准备好在 KeyUp 上看到字符.(仅适用于使用 Console-I/O API 的应用程序.)
    • 结论:许多应用程序不会对 HEX 输入事件做出反应.
    • 此外,粘贴"字符会发生什么取决于当前的键盘布局:是否可以在不使用前缀键的情况下输入字符(但可以使用任意复杂的修饰符组合,如 Ctrl-Alt-AltGr-Kana-Shift-Gray*) 然后通过模拟按键传递.这是任何应用程序所期望的 — 粘贴任何仅包含此类字符的内容都可以.
    • 但是,其他"字符是通过模拟十六进制输入来传递的.

    结论:除非您的键盘布局支持输入大量没有前缀键的字符,一些有问题的应用程序可能会在您Paste 通过控制台的 UI:Alt-Space EP.(就是我推荐使用我的键盘布局的原因!)

还应该记住,Windows 的替代的、‘功能更强大’的控制台"根本不是控制台.它们不支持 Console-I/O API,因此依赖这些 API 工作的程序将无法运行.(不过,仅使用文件 I/O API 到控制台文件句柄"的程序可以正常工作.)

此类非控制台的一个示例是 MicroSoft 的 Powershell 的一部分.我不用这个;进行实验,按下并松开 WinKey,然后输入 powershell.

<小时>

(另一方面,还有诸如ConEmuANSICON 尝试做更多事情:他们试图"拦截Console-I/O 使真正的控制台应用程序"也能工作的 API.这绝对适用于玩具示例程序;在现实生活中,这可能会也可能不会解决您的特定问题.实验.)>

总结

  • 设置字体、键盘布局(并可选择允许十六进制输入).

  • 仅使用通过 Console-I/O API 并接受 Unicode 命令行参数的程序.例如,任何 cygwin 编译的程序都应该没问题.正如我已经说过的,CMD 也很好.

UPD: 最初,对于 cp65001 中的错误,我混合了内核和 CRTL 层(UPD²: 和 Windows 用户模式API!).另外: Win8 修复了这个错误的一半;我澄清了有关更好的控制台"应用程序的部分,并添加了对 Python 如何做到这一点的参考.

We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.

I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?

解决方案

My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:

  • CMD and "console" are unrelated factors. CMD.exe is a just one of programs which are ready to "work inside" a console ("console applications").
  • AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active.
  • Windows’ console has A LOT of support for Unicode — but it is not perfect (just "good enough"; see below).
  • chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems with cp65001, but the rest is still applicable to Win10.
  • I work in cp1252. As I already said: To input/output Unicode in a console, one does not need to set the codepage.

The details

  • To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API. (For an example, see how Python does it.)
  • Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API.
  • Console font rendering supports only Unicode characters in BMP (in other words: below U+10000). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]

Practical considerations

  • The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration:

    • For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.)
    • For input: a capable keyboard layout. For best results, I recommend my layouts.
    • For input: allow HEX input of Unicode.
  • One more gotcha with "Pasting" into a console application (very technical):

    • HEX input delivers a character on KeyUp of Alt; all the other ways to deliver a character happen on KeyDown; so many applications are not ready to see a character on KeyUp. (Only applicable to applications using Console-I/O API.)
    • Conclusion: many application would not react on HEX input events.
    • Moreover, what happens with a "Pasted" character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray*) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine.
    • However, the "other" characters are delivered by emulating HEX input.

    Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console’s UI: Alt-Space E P. (This is why I recommend using my keyboard layouts!)

One should also keep in mind that the "alternative, ‘more capable’ consoles" for Windows are not consoles at all. They do not support Console-I/O APIs, so the programs which rely on these APIs to work would not function. (The programs which use only "File-I/O APIs to the console filehandles" would work fine, though.)

One example of such non-console is a part of MicroSoft’s Powershell. I do not use it; to experiment, press and release WinKey, then type powershell.


(On the other hand, there are programs such as ConEmu or ANSICON which try to do more: they "attempt" to intercept Console-I/O APIs to make "true console applications" work too. This definitely works for toy example programs; in real life, this may or may not solve your particular problems. Experiment.)

Summary

  • set font, keyboard layout (and optionally, allow HEX input).

  • use only programs which go through Console-I/O APIs, and accept Unicode command-line arguments. For example, any cygwin-compiled program should be fine. As I already said, CMD is fine too.

UPD: Initially, for a bug in cp65001, I was mixing up Kernel and CRTL layers (UPD²: and Windows user-mode API!). Also: Win8 fixes one half of this bug; I clarified the section about "better console" application, and added a reference to how Python does it.

这篇关于如何在 Windows 命令行中使用 unicode 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆