Emacs,unicode,xterm鼠标转义序列和宽终端 [英] Emacs, unicode, xterm mouse escape sequences, and wide terminals

查看:184
本文介绍了Emacs,unicode,xterm鼠标转义序列和宽终端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

短版本:当使用emacs的xterm-mouse模式时,有人(emacs?bash?xterm?)拦截xterm的控制序列,并用\0替换它们。这是广泛的监视器的痛苦,因为只有前223列有鼠标。



什么是罪魁祸首,我该如何解决?



从我可以告诉这个与Unicode / UTF-8支持有关,因为5-6年前,当我上一次有一台大显示器时,这不是一个问题。



Gory详情如下。



谢谢!



Emacs xterm-mouse-mode有一个着名的弱点,处理鼠标点击开始周围x = 95。 最新版本的emacs采用的解决方法将问题推迟到x = 223。



几年前,我发现xterm以7位八位字节编码位置。给定位置'x'编码,用X = x-96,发送:

  \40 + x(x < 96)
\300 + X / 64 \200 + X%64(否则)

我们必须从emacs添加一个给定的x位置,因为xterm中的位置从1开始,而不是0。因此,魔术x = 95数字弹出,因为它被编码为\300\200 - 第一个转义的数字。某些人(emacs?bash?xterm?)会将来自 ISO 2022 的C0控制序列进行处理。从x = 159开始,我们更改为C1序列(\301\200),这也是ISO 2022的一部分。



使用\ 302序列,对应于当前的x = 223限制。几年以前,我能够扩展骇客手动拦截\302和\303序列,这已经过了问题。快进几年,今天我发现我被困在x = 223,因为有人用\0替换那些序列。



所以,我在哪里'd希望点击第1行250列生成

  ESC [M SPC \303\207! ESC [M#\303\207! 

相反,emacs报告(对于任何col> 223)

  ESC [M SPC C- @! ESC [M#C- @! 

我怀疑Unicode / UTF-8的支持是罪魁祸首。一些挖掘表明, Unicode标准允许C0和C1序列作为UTF-8的一部分,直到2000年11月 ,我想有人没有得到备忘录(幸运的是)。但是,\302\200 - \302\237是 Unicode 控制序列,所以有人把它们弄砸了(做谁知道 - 他们跟他们在一起),并返回\0。



一些更详细的问题:

- 这是谁在截获到emacs的损失缓冲区之前拦截代码的人?

- 如果它真的只是控制序列,那么来自\302\237之后的字符(UTF-8是否是可打印的Unicode)的编码也会返回为\0?

- 什么使emacs决定是否将损失显示为unicode字符或八进制转义序列,为什么不匹配?例如,我的自制cygwin emacs 23.2.1(xterm 229)报告列161的\301\202,但是我的rhel5.5提供的emacs 22.3.1(xterm 215)报告Â(拉丁字母A与回旋),实际上是UTF-8中的\303\202!



更新:



这是针对xterm-261的补丁,鼠标位置为utf-8格式:

  diff -r button.c button.utf-8-fix.c 
--- a / button.c Sat Aug 14 08:23:00 2010 +0200
+++ b / button.c Thu Aug 26 16:16:48 2010 +0200
@@ - 3994,1 +3994,27 @@
- #define MOUSE_LIMIT(255 - 32)
+#define MOUSE_LIMIT(2047 - 32)
+#define MOUSE_UTF_8_START(127 - 32)
+
+ static unsigned
+ EmitMousePosition(Char line [],unsigned count,int value)
+ {
+ / *添加指针位置到键序列
+ *
+ *将大位置编码为双字节UTF-8
+ *
+ *注意:历史上可以发出256,成为
+ *零截断到8位。虽然这可以说是一个bug,但是
+ *它也有一些有用的作为一个过去的标记,所以我们保留它。
+ * /
+ if(value == MOUSE_LIMIT){
+ line [count ++] = CharOf(0);
+}
+ else if(value< MOUSE_UTF_8_START){
+ line [count ++] = CharOf(''+ value + 1);
+}
+ else {
+ value + =''+ 1;
+ line [count ++] = CharOf(0xC0 +(value>> 6));
+ line [count ++] = CharOf(0x80 +(value& 0x3F));
+}
+返回计数;
+}
@@ -4001,1 +4027,1 @@
- Char行[6];
+ Char line [9]; / * \e [> M Pb Pxh Pxl Pyh Pyl * /
@@ -4021,2 +4047,0 @@
- 否if(row> MOUSE_LIMIT)
- row = MOUSE_LIMIT;
@@ -4028,1 +4052,5 @@
- 否如果(col> MOUSE_LIMIT)
+
+ / *限制可表示的鼠标尺寸* /
+ if(row> MOUSE_LIMIT)
+ row = MOUSE_LIMIT;
+ if(col> MOUSE_LIMIT)
@@ -4090,2 +4118,2 @@
- 行[count ++] = CharOf(''+ col + 1);
- line [count ++] = CharOf(''+ row + 1);
+ count = EmitMousePosition(line,count,col);
+ count = EmitMousePosition(line,count,row);

希望这个(或类似的东西)将出现在未来版本的xterm ...补丁使xterm与emacs-23(它假设是utf-8输入)一起工作,并修复了xt-mouse.el的现有问题。要使用它与emacs-22需要重新定义它用于解码鼠标位置的功能(新定义适用于emacs-23):

 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $($($ 
(cond
;;在可编程范围之外的鼠标点击产生0
((= c 0)#x800)
;;必须将UTF-8自动转换为unicode
((和(> = c#xC2)(< emacs-major-version 23))
(logior(lsh(log和c#x1F)6)(logand(read-char)#x3F) )
;;正常情况
(c)))))

发行作为您登录的所有计算机上的.emacs的一部分的defun,并且在您工作的任何机器上修补xterm。



警告:使用xterm的鼠标模式但不将其输入视为utf-8的应用程序将被该修补程序困惑,因为鼠标转义序列变长。然而,这些应用程序与当前的xterm相当可怕,因为x> 95的鼠标位置看起来像是utf-8代码但不是。我会为xterm创建一个新的鼠标模式,但某些应用程序(gnu screen!)过滤出未知的转义序列。 Emacs是我使用的唯一终端鼠标应用程序,所以我认为补丁是一个净胜,但是YMMV。

解决方案

好的,想出来实际上有两个问题。



首先,一些源潜水显示xterm将窗口的启用鼠标的区域剪切为223x223个字符,并为所有其他位置发送0x0。



其次,emacs-23是UTF-8,并且被x> 160和y> 94​​的鼠标事件混淆;在这些情况下,xterm的x和y编码看起来像一个两字节的UTF-8字符(例如0xC2 0x80),因此鼠标序列似乎是一个字符短。



我正在使用xterm的补丁,使鼠标事件发出UTF-8(它们都会解开emacs-23,并允许终端高达2047x2047),但我不知道会怎么样。


Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with \0. This is a pain on wide monitors because only the first 223 columns have mouse.

What is the culprit, and how can I work around it?

From what I can tell this has something to do with Unicode/UTF-8 support, because it wasn't a problem 5-6 years ago when I last had a big monitor.

Gory details follow...

Thanks!

Emacs xterm-mouse-mode has a well-known weakness handling mouse clicks starting around x=95. A workaround, adopted by recent versions of emacs, pushes the problem off to x=223.

Several years ago I figured out that xterm encodes positions in 7-bit octets. Given position 'x' to encode, with X=x-96, send:

\40+x (x < 96)  
\300+X/64 \200+X%64 (otherwise)  

We have to add one to given x position from emacs, because positions in xterm start at one, not zero. Hence the magic x=95 number pops up because it's coded as "\300\200" -- the first escaped number. Somebody (emacs? bash? xterm?) treats those like "C0" control sequences from ISO 2022. Starting at x=159, we change to "C1" sequences (\301\200), which are also part of ISO 2022.

Trouble hits with \302 sequences, which corresponds to the current x=223 limit. Several years ago I was able to extend the hack to intercept \302 and \303 sequences manually, which got past the problem. Fast forward a few years, and today I find that I'm stuck back at x=223 because Somebody is replacing those sequences with \0.

So, where I'd expect clicking at line 1, col 250 to produce

ESC [ M SPC \303\207 ! ESC [ M # \303\207 !

Instead emacs reports (for any col > 223)

ESC [ M SPC C-@ ! ESC [ M # C-@ !

I suspect that Unicode/UTF-8 support is the culprit. Some digging shows that the Unicode standard allowed C0 and C1 sequences as part of UTF-8 until Nov 2000, and I guess Somebody didn't get the memo (fortunately). However, \302\200 - \302\237 are Unicode control sequences, so Somebody slurps them up (doing who-knows-what with them!) and returns \0 instead.

Some more detailed questions:
- Who is this Somebody that intercepts the codes before they reach emacs' lossage buffer?
- If it's really just about control sequences, how come characters after \302\237, which are UTF-8 encodings of printable Unicode, also come back as \0 ?
- What makes emacs decide whether to display lossage as unicode characters or octal escape sequences, and why don't the two match? For example, my self-built cygwin emacs 23.2.1 (xterm 229) reports \301\202 for column 161, but my rhel5.5-supplied emacs 22.3.1 (xterm 215) reports "Â" (latin A with circumflex), which is actually \303\202 in UTF-8!

Update:

Here's a patch against xterm-261 which makes it emit mouse positions in utf-8 format:

diff -r button.c button.utf-8-fix.c
--- a/button.c  Sat Aug 14 08:23:00 2010 +0200
+++ b/button.c  Thu Aug 26 16:16:48 2010 +0200
@@ -3994,1 +3994,27 @@
-#define MOUSE_LIMIT (255 - 32)
+#define MOUSE_LIMIT (2047 - 32)
+#define MOUSE_UTF_8_START (127 - 32)
+
+static unsigned
+EmitMousePosition(Char line[], unsigned count, int value)
+{
+    /* Add pointer position to key sequence
+     * 
+     * Encode large positions as two-byte UTF-8 
+     *
+     * NOTE: historically, it was possible to emit 256, which became
+     * zero by truncation to 8 bits. While this was arguably a bug,
+     * it's also somewhat useful as a past-end marker so we keep it.
+     */
+    if(value == MOUSE_LIMIT) {
+       line[count++] = CharOf(0);
+    }
+    else if(value < MOUSE_UTF_8_START) {
+       line[count++] = CharOf(' ' + value + 1);
+    }
+    else {
+       value += ' ' + 1;
+       line[count++] = CharOf(0xC0 + (value >> 6));
+       line[count++] = CharOf(0x80 + (value & 0x3F));
+    }
+    return count;
+}
@@ -4001,1 +4027,1 @@
-    Char line[6];
+    Char line[9]; /* \e [ > M Pb Pxh Pxl Pyh Pyl */
@@ -4021,2 +4047,0 @@
-    else if (row > MOUSE_LIMIT)
-       row = MOUSE_LIMIT;
@@ -4028,1 +4052,5 @@
-    else if (col > MOUSE_LIMIT)
+
+    /* Limit to representable mouse dimensions */
+    if (row > MOUSE_LIMIT)
+       row = MOUSE_LIMIT;
+    if (col > MOUSE_LIMIT)
@@ -4090,2 +4118,2 @@
-       line[count++] = CharOf(' ' + col + 1);
-       line[count++] = CharOf(' ' + row + 1);
+       count = EmitMousePosition(line, count, col);
+       count = EmitMousePosition(line, count, row);

Hopefully this (or something like it) will appear in a future version of xterm... the patch makes xterm work out of the box with emacs-23 (which assumes utf-8 input) and fixes the existing problems with xt-mouse.el also. To use it with emacs-22 requires a redefinition of the function it uses to decode mouse positions (the new definition works fine with emacs-23 also):

(defadvice xterm-mouse-event-read (around utf-8 compile activate)
  (setq ad-return-value
        (let ((c (read-char)))
          (cond
           ;; mouse clicks outside the encodable range produce 0
           ((= c 0) #x800)
           ;; must convert UTF-8 to unicode ourselves
           ((and (>= c #xC2) (< emacs-major-version 23))
            (logior (lsh (logand c #x1F) 6) (logand (read-char) #x3F)))
           ;; normal case
           (c) ) )))

Distribute the defun as part of the .emacs on all machines you log into, and patch the xterm on any machines you work from. Voila!

WARNING: Applications which use xterm's mouse modes but do not treat their input as utf-8 will get confused by this patch because the mouse escape sequences get longer. However, those applications break horribly with the current xterm because mouse positions with x > 95 look like utf-8 codes but aren't. I'd create a new mouse mode for xterm, but certain applications (gnu screen!) filter out unknown escape sequences. Emacs is the only terminal-mouse app I use, so I consider the patch a net win, but YMMV.

解决方案

OK, figured it out. There are actually two issues.

First, some source diving shows that xterm clips the mouse-enabled region of the window to 223x223 chars, and sends 0x0 for all other positions.

Second, emacs-23 is UTF-8 aware and gets confused by mouse events having x>160 and y>94; in those cases xterm's encoding for x and y looks like a two-byte UTF-8 character (e.g. 0xC2 0x80) and as a result the mouse sequence seems one character short.

I'm working on a patch for xterm to make mouse events emit UTF-8 (which would both unconfuse emacs-23 and allow terminals up to 2047x2047), but I'm not sure yet how it will turn out.

这篇关于Emacs,unicode,xterm鼠标转义序列和宽终端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆