试图解析自由格式的ANSI文本。 [英] Attempting to parse free-form ANSI text.

查看:69
本文介绍了试图解析自由格式的ANSI文本。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧......我正在尝试从

telnet应用程序中找到解析ANSI文本的方法。但是,我遇到了一些麻烦。


我想要做的是从输出中删除所有ANSI序列_removed_

除了那些管理的颜色代码或文字显示(简而言之,

那些是ESC [#m(附加#s分隔;字符)。

留下的,那些是颜色代码,我想

行动,并从文本流中删除,并显示文本。


我正在使用wxPython '的TextCtrl作为输出,所以当我获取一个ANSI颜色

控制序列时,我想基本把它变成对wxWidgets的调用''

TextCtrl控件的.SetDefaultStyle方法,将相应的

颜色/亮度/斜体/粗体等设置添加到TextCtrl,直到

下一个ANSI代码进来改变它。


这看起来很简单,但我似乎无法理解这个想法。

: - /

我有一个来源t在 http://fd0man.theunixplace.com/Tmud.tar 上进行宣传

包含有问题的代码。简而言之,信息是通过传统上连接到

telnet客户端的TCP / IP套接字进入的,因此可以在中线(或者甚至控制中期

序列)。如果有人对我正在做的事情有什么想法,期待,或者假设这是错误的话,我会很高兴听到它。

的代码表现不像我预期的那样是在src / AnsiTextCtrl.py中,但我

已包含整个项目,因为它代表完整性。


任何帮助将不胜感激!谢谢!


- 迈克

解决方案

Michael B. Trausch写道:


好​​吧......我试图找到一种从

telnet应用程序解析ANSI文本的方法。但是,我遇到了一些麻烦。


我想要做的是从输出中删除所有ANSI序列_removed_

除了那些管理的颜色代码或文字显示(简而言之,

那些是ESC [#m(附加#s分隔;字符)。

留下的,那些是颜色代码,我想

行动,并从文本流中删除,并显示文本。


我正在使用wxPython '的TextCtrl作为输出,所以当我获取一个ANSI颜色

控制序列时,我想基本把它变成对wxWidgets的调用''

TextCtrl控件的.SetDefaultStyle方法,将相应的

颜色/亮度/斜体/粗体等设置添加到TextCtrl,直到

下一个ANSI代码进来改变它。


这看起来很简单,但我似乎无法理解这个想法。

: - /

我有源焦油在 http://fd0man.theunixplace.com/Tmud.tar 上打球

包含有问题的代码。简而言之,信息是通过传统上连接到

telnet客户端的TCP / IP套接字进入的,因此可以在中线(或者甚至控制中期

序列)。如果有人对我正在做的事情有什么想法,期待,或者假设这是错误的话,我会很高兴听到它。

的代码表现不像我预期的那样是在src / AnsiTextCtrl.py中,但我

已包含整个项目,因为它代表完整性。


任何帮助将不胜感激!谢谢!


- Mike



*我没有从TCP / IP读取的经验。但是以坦率的心态看着你的
程序,我会说它是为了处理一大块内存中的数据
。如果,如你所说,你从TCP / IP获得的块可能会在任何地方开始和结束
,大概是你通过每个块传递

AppendText,然后你有一个同步问题,因为每个调用都会重置

你的转义标志,即使新的块在

转义序列的中间开始。也许你应该在

结束时切断不完整的逃生并将它们添加到下一个块。


并且:


if(len(buffer)0):

wx.TextCtrl.AppendText(self,buffer)<<<你确定文字与控件在同一个地方吗?


if(len(AnsiBuffer)0):

wx.TextCtrl.AppendText(self,AnsiBuffer)<<<你说你想要

剥离控制序列

弗雷德里克


*


" Michael B. Trausch" <" mike


#at ^& nospam!%trauschus"写在留言中

新闻:Gs ********** ********************@comcast.com。 ..


好​​吧......我试图找到一种从

telnet应用程序解析ANSI文本的方法。但是,我遇到了一些麻烦。


我想要做的是从输出中删除所有ANSI序列_removed_

除了那些管理的颜色代码或文字显示(简而言之,

那些是ESC [#m(附加#s分隔;字符)。

留下的,那些是颜色代码,我想

行动,并从文本流中删除,并显示文本。



这是一个基于pyparsing的扫描仪/转换器,以及结尾处的一些测试代码。它负责部分转义序列,并删除表格中的任何

序列

"< ESC> [##; ##; ...< alpha>",除非尾随的alpha是''m''。

pyparsing项目wiki位于 http://pyparsing.wikispaces.com


- Paul


来自pyparsing import *


ESC = chr(27)

escIntro = Literal(ESC +''['').suppress()

整数= Word (nums)


colorCode =组合(escIntro +

可选(delimitedList(整数,delim ='';''))+

抑制(''m''))。setResultsName(" colorCode")

#define搜索模式将匹配非颜色ANSI命令

#代码 - 这些将被丢弃在地板上

otherAnsiCode =抑制(组合(escIntro +

可选)(delimitedList(整数,delim ='';'' ))+

oneOf(list(alphas))))


partialAnsiCode = Combine(Literal(ESC)+

可选(''['')+

可选(delimitedList(整数,delim ='';'')+

可选('';''))+

StringEnd())。setResultsName(" partialCode")

ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode

#保留传入文本中的标签

ansiSearchPattern.parseWithTabs()

def processInputString(inputString):

lastEnd = 0

for t,start,end in ansiSearchPattern.scanString(inputString):

#pass inputString [lastEnd:start] to wxTextControl - font styles

在解析动作中设置

print inputString [lastEnd:start]


#处理颜色代码,如果有的话:

如果t.getName()==" colorCode":

if t:

print"<将颜色属性更改为%s> " %t.asList()

else:

print"<检测到空颜色序列>"

elif t.getName() ==" partialCode":

print"<找到部分转义序列%s,将其粘贴在

next>>前面%t

#返回部分代码,预先添加到下一个字符串

#发送到processInputString

返回t [0]

else:

#发现其他类型的ANSI代码,什么都不做

通过


lastEnd = end


##将inputString [lastEnd:]传递给wxTextControl - 这是最后一位

#最后一个转义序列后的输入字符串

print inputString [lastEnd:]

test =""" \

这是一个包含一些ANSI序列的测试字符串。

序列1:〜[10; 12m

序列2:〜[3; 4h

序列3:〜[4; 5m

序列4; 〜[m

序列5; 〜[24HN更多逃脱序列。

~ [7"" .replace('''',chr(27))


leftOver = processInputString(test)

打印:

这是一个包含一些ANSI序列的测试字符串。

序列1:

<将颜色属性更改为[''1012'']>


序列2:


序列3:

<将颜色属性更改为[''45'']>


序列4;

<将颜色属性更改为[ '''']>


序列5;

没有更多的转义序列。


< found部分转义序列[''\ x1b [7''],在下一个前面加上>


Alright... I am attempting to find a way to parse ANSI text from a
telnet application. However, I am experiencing a bit of trouble.

What I want to do is have all ANSI sequences _removed_ from the output,
save for those that manage color codes or text presentation (in short,
the ones that are ESC[#m (with additional #s separated by ; characters).
The ones that are left, the ones that are the color codes, I want to
act on, and remove from the text stream, and display the text.

I am using wxPython''s TextCtrl as output, so when I "get" an ANSI color
control sequence, I want to basically turn it into a call to wxWidgets''
TextCtrl.SetDefaultStyle method for the control, adding the appropriate
color/brightness/italic/bold/etc. settings to the TextCtrl until the
next ANSI code comes in to alter it.

It would *seem* easy, but I cannot seem to wrap my mind around the idea.
:-/

I have a source tarball up at http://fd0man.theunixplace.com/Tmud.tar
which contains the code in question. In short, the information is
coming in over a TCP/IP socket that is traditionally connected to with a
telnet client, so things can be broken mid-line (or even mid-control
sequence). If anyone has any ideas as to what I am doing, expecting, or
assuming that is wrong, I would be delighted to hear it. The code that
is not behaving as I would expect it to is in src/AnsiTextCtrl.py, but I
have included the entire project as it stands for completeness.

Any help would be appreciated! Thanks!

-- Mike

解决方案

Michael B. Trausch wrote:

Alright... I am attempting to find a way to parse ANSI text from a
telnet application. However, I am experiencing a bit of trouble.

What I want to do is have all ANSI sequences _removed_ from the output,
save for those that manage color codes or text presentation (in short,
the ones that are ESC[#m (with additional #s separated by ; characters).
The ones that are left, the ones that are the color codes, I want to
act on, and remove from the text stream, and display the text.

I am using wxPython''s TextCtrl as output, so when I "get" an ANSI color
control sequence, I want to basically turn it into a call to wxWidgets''
TextCtrl.SetDefaultStyle method for the control, adding the appropriate
color/brightness/italic/bold/etc. settings to the TextCtrl until the
next ANSI code comes in to alter it.

It would *seem* easy, but I cannot seem to wrap my mind around the idea.
:-/

I have a source tarball up at http://fd0man.theunixplace.com/Tmud.tar
which contains the code in question. In short, the information is
coming in over a TCP/IP socket that is traditionally connected to with a
telnet client, so things can be broken mid-line (or even mid-control
sequence). If anyone has any ideas as to what I am doing, expecting, or
assuming that is wrong, I would be delighted to hear it. The code that
is not behaving as I would expect it to is in src/AnsiTextCtrl.py, but I
have included the entire project as it stands for completeness.

Any help would be appreciated! Thanks!

-- Mike

*I have no experience with reading from TCP/IP. But looking at your
program with a candid mind I''d say that it is written to process a chunk
of data in memory. If, as you say, the chunks you get from TCP/IP may
start and end anywhere and, presumably you pass each chunk through
AppendText, then you have a synchronization problem, as each call resets
your escape flag, even if the new chunk starts in the middle of an
escape sequence. Perhaps you should cut off incomplete escapes at the
end and prepend them to the next chunk.

And:

if(len(buffer) 0):
wx.TextCtrl.AppendText(self, buffer) <<< Are you sure text goes
into the same place as the controls?

if(len(AnsiBuffer) 0):
wx.TextCtrl.AppendText(self, AnsiBuffer) <<< You say you want to
strip the control sequences
Frederic

*


"Michael B. Trausch" <"mike


#at^&nospam!%trauschus"wrote in message
news:Gs******************************@comcast.com. ..

Alright... I am attempting to find a way to parse ANSI text from a
telnet application. However, I am experiencing a bit of trouble.

What I want to do is have all ANSI sequences _removed_ from the output,
save for those that manage color codes or text presentation (in short,
the ones that are ESC[#m (with additional #s separated by ; characters).
The ones that are left, the ones that are the color codes, I want to
act on, and remove from the text stream, and display the text.

Here is a pyparsing-based scanner/converter, along with some test code at
the end. It takes care of partial escape sequences, and strips any
sequences of the form
"<ESC>[##;##;...<alpha>", unless the trailing alpha is ''m''.
The pyparsing project wiki is at http://pyparsing.wikispaces.com.

-- Paul

from pyparsing import *

ESC = chr(27)
escIntro = Literal(ESC + ''['').suppress()
integer = Word(nums)

colorCode = Combine(escIntro +
Optional(delimitedList(integer,delim='';'')) +
Suppress(''m'')).setResultsName("colorCode")

# define search pattern that will match non-color ANSI command
# codes - these will just get dropped on the floor
otherAnsiCode = Suppress( Combine(escIntro +
Optional(delimitedList(integer,delim='';'')) +
oneOf(list(alphas)) ) )

partialAnsiCode = Combine(Literal(ESC) +
Optional(''['') +
Optional(delimitedList(integer,delim='';'') +
Optional('';'')) +
StringEnd()).setResultsName("partialCode")
ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode
# preserve tabs in incoming text
ansiSearchPattern.parseWithTabs()

def processInputString(inputString):
lastEnd = 0
for t,start,end in ansiSearchPattern.scanString( inputString ):
# pass inputString[lastEnd:start] to wxTextControl - font styles
were set in parse action
print inputString[lastEnd:start]

# process color codes, if any:
if t.getName() == "colorCode":
if t:
print "<change color attributes to %s>" % t.asList()
else:
print "<empty color sequence detected>"
elif t.getName() == "partialCode":
print "<found partial escape sequence %s, tack it on front of
next>" % t
# return partial code, to be prepended to the next string
# sent to processInputString
return t[0]
else:
# other kind of ANSI code found, do nothing
pass

lastEnd = end

# # pass inputString[lastEnd:] to wxTextControl - this is the last bit
# of the input string after the last escape sequence
print inputString[lastEnd:]
test = """\
This is a test string containing some ANSI sequences.
Sequence 1: ~[10;12m
Sequence 2: ~[3;4h
Sequence 3: ~[4;5m
Sequence 4; ~[m
Sequence 5; ~[24HNo more escape sequences.
~[7""".replace(''~'',chr(27))

leftOver = processInputString(test)
Prints:
This is a test string containing some ANSI sequences.
Sequence 1:
<change color attributes to [''1012'']>

Sequence 2:

Sequence 3:
<change color attributes to [''45'']>

Sequence 4;
<change color attributes to ['''']>

Sequence 5;
No more escape sequences.

<found partial escape sequence [''\x1b[7''], tack it on front of next>


这篇关于试图解析自由格式的ANSI文本。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆