在字段中输入文本时发生什么事件?我的输入是什么文本编码? [英] What events occur when I enter text into a field? What text encoding is my input in?

查看:149
本文介绍了在字段中输入文本时发生什么事件?我的输入是什么文本编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用键盘在Web浏览器显示的表单字段中输入多语言文本。在O / S不明和浏览器不可知的水平,我认为以下事件发生(如果我错了,请纠正我,因为我认为我):


  1. 在每次按键时,都有一个中断,表示按下了一个键

  2. O / S(或键盘驱动程序?)确定键码,

  3. O / S窗口管理器查找当前关注的窗口(浏览器),并将键盘事件传递给它

  4. 浏览器的GUI工具包会查找当前关注的元素(在这种情况下是我正在输入的字段),并将键盘事件传递给它

  5. 字段更新为包含新字符

  6. 发送表单时,浏览器会将输入的文本发送给表单目标>

在我继续之前,这是真的发生吗?



接下来,我想问:上述每个步骤中的角色是如何表现的?在步骤1,键码可以是设备特定的幻数。在步骤2,键盘驱动程序可以将其转换为O / S理解的东西(例如,USB HID规范:,我还发现了一个用于扫描代码的Microsoft规范,以防您构建和验证您自己的104个键窗口键盘。



在任何情况下,我们可以假设一个PC键盘使用设置2,这意味着当一个键被按下时,它发送到计算机的代码一个键释放时。一个USB HID规范不指定由键盘发送的扫描代码的方式,它只指定用于发送这些扫描代码的结构。
$
现在既然我们谈论的硬件这是真正的所有操作系统,但每个操作系统如何处理这些代码可能会有所不同。我会限制自己对Windows中发生的情况,但我认为其他操作系统应该遵循大致相同的路径。



2 - 操作系统



我不知道Windows如何处理键盘,哪些部分由驱动程序处理,哪些是内核,哪些是在用户模式;但足以说键盘被周期性地轮询以改变为键状态,并且扫描代码被转换并转换为WM_KEYDOWN / WM_KEYUP消息,其包含虚拟键代码
要精确,Windows还会生成WM_SYSKEYUP / WM_SYSKEYDOWN消息,您可以阅读更多关于它们的信息此处



3 - 申请



对于Windows,应用程序获取原始虚拟键代码,由它决定是否使用它们或将它们转换为字符代码。

现在没有人写好诚实的C窗口程序,但是一旦程序员用来推出自己的消息泵处理代码,大多数消息泵将包含类似以下的代码:

  while(GetMessage(& msg,NULL,0,0)!= 0)
{
TranslateMessage(& msg);
DispatchMessage(& msg);
}

TranslateMessage是魔法发生的地方。 TranslateMessage中的代码将跟踪WM_KEYDOWN(和WM_SYSKEYDOWN)消息,并生成WM_CHAR消息(以及WM_DEADCHAR,WM_SYSCHAR,WM_SYSDEADCHAR。)。$ $ $ $ $ $ $ $ $ WM_CHAR消息包含UTF-16(实际上是UCS-不允许拆分头发)代码,考虑当时的活动键盘布局,从WM_KEYDOWN消息翻译的字符。

在unicode之前编写的应用程序怎么办?那些应用程序使用了ANSI版本的RegisterClassEx(即RegisterClassExA)来注册它们的窗口。在这种情况下,TranslateMessage根据键盘布局和活动文化生成带有8位字符代码的WM_CHAR消息。



4 - 5 - 调度和显示字符。 >

在使用UI库的现代代码中,完全有可能(尽管不太可能)不使用TranslateMessage并且有WM_KEYDOWN事件的自定义翻译。标准窗口控件(窗口部件)理解并处理发送给它们的WM_CHAR消息,但是在windows下运行的UI库/ VM可以实现自己的调度机制,许多都可以。



回答您的问题。


I'm using the keyboard to enter multi-lingual text into a field in a form displayed by a Web browser. At an O/S-agnostic and browser-agnostic level, I think the following events take place (please correct me if I'm wrong, because I think I am):

  1. On each keypress, there is an interrupt indicating a key was pressed
  2. The O/S (or the keyboard driver?) determines the keycode and converts that to some sort of keyboard event (character, modifiers, etc).
  3. The O/S' window manager looks for the currently-focused window (the browser) and passes the keyboard event to it
  4. The browser's GUI toolkit looks for the currently-focused element (in this case, the field I'm entering into) and passes the keyboard event to it
  5. The field updates itself to include the new character
  6. When the form is sent, the browser encodes the entered text before sending it to the form target (what encoding?)

Before I go on, is this what actually happens? Have I missed or glossed over anything important?

Next, I'd like to ask: how is the character represented at each of the above steps? At step 1, the keycode could be a device-specific magic number. At step 2, the keyboard driver could convert that to something the O/S understands (for example, the USB HID spec: http://en.wikipedia.org/wiki/USB_human_interface_device_class). What about at subsequent steps? I think the encodings at steps 3 and 4 are OS-dependent and application-dependent (browser), respectively. Can they ever be different, and if yes, how is that problem resolved?

The reason I'm asking is I've run into a problem that is specific to a site that I started to use recently:

Things appear to be working until step 6 above, where the form with the entered text gets submitted, after which the text is mangled beyond recognition. While it's pretty obvious the site isn't handling Unicode input correctly, the incident has led me to question my own understanding of how things work, and now I'm here.

解决方案

Anatomy of a character from key press to application:

1 - The PC Keyboard:

PC keyboards are not the only type of keyboard, but I'll restrict myself to them.
PC Keyboards surprisingly enough do not understand characters, they understand keyboard buttons. This allows the same hardware used by a US keyboard to be used for QEWERTY or Dvorak and for English in any other language that uses the US 101/104-key format (some languages have extra keys.)

Keyboards use standard scan codes to identify the keys, and to make matters more interesting keyboards can be configured to use a specific set of codes:

Set 1 - used in the old XT keyboards
Set 2 - used currently and
Set-3 used by PS/2 keyboards which no one uses today.

Sets 1 and 2 use make and break codes (i.e. press down and release codes). Set 3 uses make and break codes just for some keys (like shift) and only make codes for letters this allows the keyboard itself to handle key repeat when the key is pressed for long. This is good to offload key repeat processing from the PS/2 8086 or 80286 processor but rather bad for gaming.

You can read more about all this here and I also found a Microsoft specification for scan codes in case you want to build and certify your own 104 key windows keyboard.

In any case we can assume a PC Keyboard using set 2, which means it sends to the computer a code when a key is pressed and one when a key is released.
By the way the USB HID spec does not specify the scan codes sent by the keyboard it only specifies the structures used to send those scan codes.
Now since we're talking about hardware this is true for all operating systems, but how every operating system handles these codes may differ. I'll restrict myself to what happens in Windows, but I assume other operating systems should follow roughly the same path.

2 - The Operating System

I don't know exactly how exactly Windows handles the keyboard, which parts are handled by drivers, which by the kernel and which in user mode; but suffice to say the keyboard is periodically polled for changed to key state and the scan codes are translated and converted to WM_KEYDOWN/WM_KEYUP messages which contain virtual key codes. To be precise Windows also generates WM_SYSKEYUP/WM_SYSKEYDOWN messages and you can read more about them here

3 - The Application

For Windows that is it, the application gets the raw virtual key codes and it is up to it to decide to use them as is or translate them to a character code.
Nowadays nobody writes good honest C windows programs, but once upon a time programmers used to roll out their own message pump handling code and most message pumps would contain code similar to:

while (GetMessage( &msg, NULL, 0, 0 ) != 0)
{ 
        TranslateMessage(&msg); 
        DispatchMessage(&msg); 
} 

TranslateMessage is where the magic happens. The code in TranslateMessage would keep track of the WM_KEYDOWN (and WM_SYSKEYDOWN) messages and generate WM_CHAR messages (and WM_DEADCHAR, WM_SYSCHAR, WM_SYSDEADCHAR.)
WM_CHAR messages contain the UTF-16 (actually the UCS-2 but lets not split hairs) code for the character translated from the WM_KEYDOWN message by taking into account the active keyboard layout at the time.
What about application written before unicode? Those applications used the ANSI version of RegisterClassEx (i.e. RegisterClassExA) to register their windows. In this case TranslateMessage generated WM_CHAR messages with an 8 bit character code based on the keyboard layout and the active culture.

4 - 5 - Dispatching and displaying characters.

In modern code using UI libraries it is entirely possible (though unlikely) not to use TranslateMessage and have custom translation of WM_KEYDOWN events. Standard Window controls (widgets) understand and handle WM_CHAR messages dispatched to them, but UI libraries/VMs running under windows can implement their own dispatch mechanism and many do.

Hope this answers your question.

这篇关于在字段中输入文本时发生什么事件?我的输入是什么文本编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆