解析ANSI转义代码? [英] Parsing ANSI escape codes?

查看:1293
本文介绍了解析ANSI转义代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要建一个telnet应用程序在C#(有关旧校园的BBS系统如野猫脚本门游戏)似乎并不能搭建ANSI转义代码工作解析器(如光标移动,着色等) - 几乎所有的我测试过的系统发送的任何违抗的标准不确定的序列。此外,还似乎对此事的资源非常少,维基百科拥有最深入的名单,我发现这样远但即使他们说这是不完整的 - 和我遇到的大多数其他网站只是复制/粘贴维基百科的文章



我的问题:是否有一个图书馆了。那里?如果不是,怎么样的一些解析代码/正则表达式?在对于像 ESC [最起码一些适当的文档!_ 将是难以置信的帮助。



我真的感觉就像我重塑这个轮子,尤其是看到作为Telnet是互联网的相当于或多或少轮(至少在年龄方面;)的



编辑:新增古怪的例子:

  00000075h:1B 5B 73 1B 5B 32 35 35 42 1B 5B 32 35 35 43 08; [S [255B [255C。 
00000085h:5F 1B 5B 36 6E 1B 5B 75 1B 5B 21 5F 02 02 48 3F; 。_ [6N [U [_ .. ^ h
00000095h:。!?54 4D 4C 3F 1B 5B 30 6D 5F 1B 5B 32 4A 5B 1B 48; TML [0米_ [2J [H
000000a5h:。0C 0D 0A; ...
神秘的部分是21在第2行--- ^^


解决方案

一个正确的答案取决于一个人打算如何使用图书馆。任何终端仿真器的阅读的这些序列的执行基于他们的行动。但即使一个简单的终端模拟器就会明白上百序列



您例如,在一个或许更可读的形式,看起来是这样的:

 
\E [S
\E [255B
\E [255C\t_
\E [6N
\E [ú
\E [!_ ^ b ^ b?HTML?
\E [0m_
\E [2J
\E [H\f\r
\\\

使用 取消映射 (使转义字符 \E 和显示的所有的字符打印—并开始实施对于转义字符新行)。



ECMA-48介绍了




    单字节控制字符,和
  • 字节控制序列(以转义字符开始)。



控制序列具有它们仅限于某些字符,如数字和分隔符,如内容(参数),';'。控制序列也有一个明确的结局,叫的最后的角色。该序列 \E [!_ ^ B ^ B?不遵循这些规则。作为一个评论所说,也许你的记录是由终端的光标定位请求 \E [6N 响应混淆。



有这么大的方面:




  • 部分通过终端仿真器修改显示所执行的操作( \E [2J 清屏)

  • 通过一些终端仿真程序执行的操作告诉主机的关于的显示器( \E [6N 询问终端光标所在的的)

  • 某些由执行的操作终端仿真程序修改终端的行为( \E [S \E [U 保存光标位置,在以后恢复它)



总之,你可能会看到处理由终端接收的控制序列,你真的需要一个终端程序做到这一切。不是所有的终端仿真器是相同的,但是。一些使用一系列案例陈述,处理逃生,支架,数字等的连续阶段,但你的程序应该记住,单字节控制可以在多字节控制序列的中间出现。因为它们被编码方式不同,不存在冲突。但它使程序更复杂,比你想象的那么只是每次读一个序列。



xterm中使用了一些案例语句(为的最后字符,基本上),但大多数在解码的控制顺序的状态转变的正在使用的一组表的完成。他们是非常重复,但不明显构建:保罗·威廉斯指出,对于一个VT100,这些应该是对称的(基本上是处理输入7位ASCII)。一些州被视为错误,的忽略的;格式正确的序列都是反正重要的。从理论上讲,你可以重用状态表,并添加一个小的解析。该表是8500行(每行一个状态)。



除了(一)阅读现有的终端仿真程序和模仿他们规模较小,或(b)修改终端模拟器......你可以调查 libvterm




这是抽象的C99库,实现了VT220或xterm方式的终端仿真程序。它不使用任何特定的图形工具或输出系统,相反,它会调用回调函数指针,其嵌入方案应该提供它来绘制代表它。它避免了调用malloc()正常运行状态期间,允许它在嵌入式内核的情况下使用。




然而,这不是在 C#(和源的的文档)。 。不过,这仅仅是5500行代码



延伸阅读:




I'm building a telnet app in C# (for scripting door games on oldschool BBS systems e.g. Wildcat) and can't seem to build a working parser for ANSI escape codes (e.g. cursor movement, colorizing, etc) - almost all systems I've tested send undefined sequences which defy any "standards". There also seem to be very few resources on the matter, Wikipedia has the most in-depth list I've found so far but even they say it's incomplete - and most other sites I've encountered just copy/pasted Wikipedia's article.

My question: Is there a library out there? If not, how about some parsing code/Regex? At the very least some proper documentation for things like ESC[!_ would be incredibly helpful.

I really feel like I'm reinventing the wheel on this, especially seeing as Telnet is more or less the Internet's equivalent of the wheel (at least age-wise ;)

EDIT: Added an example of weirdness:

00000075h: 1B 5B 73 1B 5B 32 35 35 42 1B 5B 32 35 35 43 08 ; .[s.[255B.[255C.
00000085h: 5F 1B 5B 36 6E 1B 5B 75 1B 5B 21 5F 02 02 3F 48 ; _.[6n.[u.[!_..?H
00000095h: 54 4D 4C 3F 1B 5B 30 6D 5F 1B 5B 32 4A 1B 5B 48 ; TML?.[0m_.[2J.[H
000000a5h: 0C 0D 0A                                        ; ...
The mysterious part is '21' in line 2 ---^^

解决方案

A proper answer depends on how one intends to use the library. Any terminal emulator will read those sequences and perform actions based on them. But even a simple terminal emulator will understand about a hundred sequences.

Your example, in a perhaps more readable form, looks like this:

\E[s
\E[255B
\E[255C\t_
\E[6n
\E[u
\E[!_^B^B?HTML?
\E[0m_
\E[2J
\E[H\f\r
\n

using unmap (making the escape character \E and showing all characters printable — and beginning a new line for escape characters).

ECMA-48 describes the format for

  • single-byte control characters, and
  • multibyte control sequences (beginning with the escape character).

Control sequences have content (parameters) which are limited to certain characters such as digits and separators, e.g., ';'. Control sequences also have a definite ending, called the final character. The sequence \E[!_^B^B? does not follow those rules. As suggested in a comment, perhaps your recording was confused by the terminal's response to the cursor position request \E[6n.

With that much context:

  • some of the actions performed by a terminal emulator modify the display (\E[2J clears the display)
  • some of the actions performed by a terminal emulator tell the host about the display (\E[6n asks the terminal where the cursor is)
  • some of the actions performed by a terminal emulator modify the terminal's behavior (\E[s and \E[u save the cursor position and restore it later)

In short, you may see that to process the control sequences received by a terminal, you really need a terminal program to do all of this. Not all terminal emulators are the same, however. Some use a series of case-statements, to handle the successive stages of escape, bracket, digits, etc. But your program should keep in mind that single-byte controls can appear in the middle of multi-byte control sequences. Since they are encoded differently, there is no conflict. But it makes the program more complicated than you might suppose for just reading one sequence at a time.

xterm uses some case-statements (for the final character, basically), but most of the state transitions in decoding a control sequence are done using a set of tables. They are very repetitive, but not obvious to construct: Paul Williams pointed out that for a VT100, those should be symmetric (essentially treating the input as 7-bit ASCII). Some of the states are treated as errors, and ignored; well-formatted sequences are all that matters anyway. In theory, you could reuse the state-tables and add a "little" parsing. The tables are 8500 lines (one state per line).

Aside from (a) reading existing terminal emulators and imitating them on a smaller scale, or (b) modifying a terminal emulator ... you could investigate libvterm:

An abstract C99 library which implements a VT220 or xterm-like terminal emulator. It doesn't use any particular graphics toolkit or output system, instead it invokes callback function pointers that its embedding program should provide it to draw on its behalf. It avoids calling malloc() during normal running state, allowing it to be used in embedded kernel situations.

However, that is not in C# (and the source is the documentation). Still, it is only 5500 lines of code.

Further reading:

这篇关于解析ANSI转义代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆