Emacs、unicode、xterm 鼠标转义序列和宽终端 [英] Emacs, unicode, xterm mouse escape sequences, and wide terminals

查看:30
本文介绍了Emacs、unicode、xterm 鼠标转义序列和宽终端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简短版本:当使用 emacs 的 xterm-mouse-mode 时,有人(emacs?bash?xterm?)截取 xterm 的控制序列并用 替换它们.这在宽显示器上很痛苦,因为只有前 223 列有鼠标.

Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with . This is a pain on wide monitors because only the first 223 columns have mouse.

罪魁祸首是什么,我该如何解决?

What is the culprit, and how can I work around it?

据我所知,这与 Unicode/UTF-8 支持有关,因为在 5-6 年前我上次拥有一台大显示器时这不是问题.

From what I can tell this has something to do with Unicode/UTF-8 support, because it wasn't a problem 5-6 years ago when I last had a big monitor.

血腥细节如下...

谢谢!

Emacs xterm-mouse-mode 有一个众所周知的弱点,它处理从 x=95 开始的鼠标点击.一种解决方法,被最新版本的 emacs 采用,将问题推开到 x=223.

Emacs xterm-mouse-mode has a well-known weakness handling mouse clicks starting around x=95. A workaround, adopted by recent versions of emacs, pushes the problem off to x=223.

几年前,我发现 xterm 以 7 位八位字节编码位置.给定要编码的位置x",X=x-96,发送:

Several years ago I figured out that xterm encodes positions in 7-bit octets. Given position 'x' to encode, with X=x-96, send:

40+x (x < 96)  
300+X/64 200+X%64 (otherwise)  

我们必须给 emacs 中给定的 x 位置加 1,因为 xterm 中的位置从 1 开始,而不是 0.因此,神奇的 x=95 数字弹出,因为它被编码为300200"——第一个转义数字.有人(emacs?bash?xterm?)将那些像来自ISO 2022的C0"控制序列一样对待.从 x=159 开始,我们更改为C1"序列(301200),这也是 ISO 2022 的一部分.

We have to add one to given x position from emacs, because positions in xterm start at one, not zero. Hence the magic x=95 number pops up because it's coded as "300200" -- the first escaped number. Somebody (emacs? bash? xterm?) treats those like "C0" control sequences from ISO 2022. Starting at x=159, we change to "C1" sequences (301200), which are also part of ISO 2022.

302 序列的故障命中,对应于当前的 x=223 限制.几年前,我能够扩展 hack 以手动拦截 302 和 303 序列,从而解决了问题.快进几年,今天我发现我被困在 x=223 因为有人用 替换这些序列.

Trouble hits with 302 sequences, which corresponds to the current x=223 limit. Several years ago I was able to extend the hack to intercept 302 and 303 sequences manually, which got past the problem. Fast forward a few years, and today I find that I'm stuck back at x=223 because Somebody is replacing those sequences with .

所以,我希望在第 1 行第 250 列处单击以产生

So, where I'd expect clicking at line 1, col 250 to produce

ESC [ M SPC 303207 ! ESC [ M # 303207 !

取而代之的是 emacs 报告(对于任何 col > 223)

Instead emacs reports (for any col > 223)

ESC [ M SPC C-@ ! ESC [ M # C-@ !

我怀疑 Unicode/UTF-8 支持是罪魁祸首.一些挖掘表明 Unicode 标准允许 C0 和 C1 序列作为 UTF-8 的一部分,直到 2000 年 11 月,我猜有人没有收到备忘录(幸运的是).但是,302200 - 302237 是 Unicode 控制序列,所以有人啜饮它们(对它们做谁知道呢!)并返回 .

I suspect that Unicode/UTF-8 support is the culprit. Some digging shows that the Unicode standard allowed C0 and C1 sequences as part of UTF-8 until Nov 2000, and I guess Somebody didn't get the memo (fortunately). However, 302200 - 302237 are Unicode control sequences, so Somebody slurps them up (doing who-knows-what with them!) and returns instead.

一些更详细的问题:
- 在代码到达 emacs 的 lossage 缓冲区之前拦截代码的人是谁?
- 如果它真的只是关于控制序列,那么 302237 之后的字符,它们是可打印的 Unicode 的 UTF-8 编码,怎么会以 的形式返回?
- 是什么让 emacs 决定是将 lossage 显示为 unicode 字符还是八进制转义序列,为什么两者不匹配?例如,我自建的 cygwin emacs 23.2.1 (xterm 229) 报告第 161 列的 301202,但我的 rhel5.5 提供的 emacs 22.3.1 (xterm 215) 报告Â"(带抑扬符的拉丁文 A),实际上是 UTF-8 中的 303202!

Some more detailed questions:
- Who is this Somebody that intercepts the codes before they reach emacs' lossage buffer?
- If it's really just about control sequences, how come characters after 302237, which are UTF-8 encodings of printable Unicode, also come back as ?
- What makes emacs decide whether to display lossage as unicode characters or octal escape sequences, and why don't the two match? For example, my self-built cygwin emacs 23.2.1 (xterm 229) reports 301202 for column 161, but my rhel5.5-supplied emacs 22.3.1 (xterm 215) reports "Â" (latin A with circumflex), which is actually 303202 in UTF-8!

更新:

这是一个针对 xterm-261 的补丁,它使它以 utf-8 格式发出鼠标位置:

Here's a patch against xterm-261 which makes it emit mouse positions in utf-8 format:

diff -r button.c button.utf-8-fix.c
--- a/button.c  Sat Aug 14 08:23:00 2010 +0200
+++ b/button.c  Thu Aug 26 16:16:48 2010 +0200
@@ -3994,1 +3994,27 @@
-#define MOUSE_LIMIT (255 - 32)
+#define MOUSE_LIMIT (2047 - 32)
+#define MOUSE_UTF_8_START (127 - 32)
+
+static unsigned
+EmitMousePosition(Char line[], unsigned count, int value)
+{
+    /* Add pointer position to key sequence
+     * 
+     * Encode large positions as two-byte UTF-8 
+     *
+     * NOTE: historically, it was possible to emit 256, which became
+     * zero by truncation to 8 bits. While this was arguably a bug,
+     * it's also somewhat useful as a past-end marker so we keep it.
+     */
+    if(value == MOUSE_LIMIT) {
+       line[count++] = CharOf(0);
+    }
+    else if(value < MOUSE_UTF_8_START) {
+       line[count++] = CharOf(' ' + value + 1);
+    }
+    else {
+       value += ' ' + 1;
+       line[count++] = CharOf(0xC0 + (value >> 6));
+       line[count++] = CharOf(0x80 + (value & 0x3F));
+    }
+    return count;
+}
@@ -4001,1 +4027,1 @@
-    Char line[6];
+    Char line[9]; /* e [ > M Pb Pxh Pxl Pyh Pyl */
@@ -4021,2 +4047,0 @@
-    else if (row > MOUSE_LIMIT)
-       row = MOUSE_LIMIT;
@@ -4028,1 +4052,5 @@
-    else if (col > MOUSE_LIMIT)
+
+    /* Limit to representable mouse dimensions */
+    if (row > MOUSE_LIMIT)
+       row = MOUSE_LIMIT;
+    if (col > MOUSE_LIMIT)
@@ -4090,2 +4118,2 @@
-       line[count++] = CharOf(' ' + col + 1);
-       line[count++] = CharOf(' ' + row + 1);
+       count = EmitMousePosition(line, count, col);
+       count = EmitMousePosition(line, count, row);

希望这个(或类似的东西)会出现在 xterm 的未来版本中......补丁使 xterm 与 emacs-23(假设 utf-8 输入)一起开箱即用,并修复了 xt 的现有问题-mouse.el 也是.要将它与 emacs-22 一起使用,需要重新定义用于解码鼠标位置的函数(新定义也适用于 emacs-23):

Hopefully this (or something like it) will appear in a future version of xterm... the patch makes xterm work out of the box with emacs-23 (which assumes utf-8 input) and fixes the existing problems with xt-mouse.el also. To use it with emacs-22 requires a redefinition of the function it uses to decode mouse positions (the new definition works fine with emacs-23 also):

(defadvice xterm-mouse-event-read (around utf-8 compile activate)
  (setq ad-return-value
        (let ((c (read-char)))
          (cond
           ;; mouse clicks outside the encodable range produce 0
           ((= c 0) #x800)
           ;; must convert UTF-8 to unicode ourselves
           ((and (>= c #xC2) (< emacs-major-version 23))
            (logior (lsh (logand c #x1F) 6) (logand (read-char) #x3F)))
           ;; normal case
           (c) ) )))

在您登录的所有机器上分发 defun 作为 .emacs 的一部分,并在您工作的任何机器上修补 xterm.瞧!

Distribute the defun as part of the .emacs on all machines you log into, and patch the xterm on any machines you work from. Voila!

警告: 使用 xterm 鼠标模式但不将其输入视为 utf-8 的应用程序将被此补丁混淆,因为鼠标转义序列变长.然而,这些应用程序与当前的 xterm 严重冲突,因为 x > 95 的鼠标位置看起来像 utf-8 代码但不是.我会为 xterm 创建一个新的鼠标模式,但某些应用程序(gnu 屏幕!)会过滤掉未知的转义序列.Emacs 是我使用的唯一终端鼠标应用程序,所以我认为补丁是一个净胜,但 YMMV.

WARNING: Applications which use xterm's mouse modes but do not treat their input as utf-8 will get confused by this patch because the mouse escape sequences get longer. However, those applications break horribly with the current xterm because mouse positions with x > 95 look like utf-8 codes but aren't. I'd create a new mouse mode for xterm, but certain applications (gnu screen!) filter out unknown escape sequences. Emacs is the only terminal-mouse app I use, so I consider the patch a net win, but YMMV.

推荐答案

好的,想通了.实际上有两个问题.

OK, figured it out. There are actually two issues.

首先,一些源代码显示 xterm 将窗口的鼠标启用区域剪辑为 223x223 个字符,并为所有其他位置发送 0x0.

First, some source diving shows that xterm clips the mouse-enabled region of the window to 223x223 chars, and sends 0x0 for all other positions.

其次,emacs-23 支持 UTF-8,并且会被 x>160 和 y>94 的鼠标事件混淆;在这些情况下,xterm 对 x 和 y 的编码看起来像一个两字节的 UTF-8 字符(例如 0xC2 0x80),因此鼠标序列似乎短了一个字符.

Second, emacs-23 is UTF-8 aware and gets confused by mouse events having x>160 and y>94; in those cases xterm's encoding for x and y looks like a two-byte UTF-8 character (e.g. 0xC2 0x80) and as a result the mouse sequence seems one character short.

我正在为 xterm 开发一个补丁,以使鼠标事件发出 UTF-8(这既不会混淆 emacs-23,也允许终端高达 2047x2047),但我还不确定结果如何.

I'm working on a patch for xterm to make mouse events emit UTF-8 (which would both unconfuse emacs-23 and allow terminals up to 2047x2047), but I'm not sure yet how it will turn out.

这篇关于Emacs、unicode、xterm 鼠标转义序列和宽终端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆