将控制字符应用于字符串-Python [英] Apply control characters to a string - Python

查看:134
本文介绍了将控制字符应用于字符串-Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将控制字符(例如'\ x08 \ x08')应用于字符串(向后移动,写空间,向后移动),该字符应删除先前的char.
例如,当我键入python控制台时:

I'm trying to apply control characters, such as '\x08 \x08' that should remove the precedent char, to a string (move backwards, write space, move backwards)
For example when I type into python console :

s = "test\x08 \x08"
print s
print repr(s)

我进入终端机:

I get in my terminal :

tes
'test\x08 \x08'

我正在寻找一个函数,比方说函数",它将应用"控制字符到我的字符串中:

I'm looking for a function, let's says "function", that will 'apply' control characters to my string :

v = function("test\x08 \x08")
sys.stdout.write(v)
sys.stdout.write(repr(v))

所以我得到一个干净",无控制字符的字符串:

so I get a "clean", control-characters-free string:

tes
tes

我知道在终端中,这部分是由客户端处理的,所以也许有一种使用核心unix函数获取显示字符串的方法

I understand that in a terminal, this part is handled by the client so maybe there is a way to get the displayed string, using core unix functions

echo -e 'test\x08 \x08'
cat file.out # control char are here handled by the client
>> tes
cat -v file.out # which prints the "actual" content of the file
>> test^H ^H

推荐答案

实际上,答案比简单的格式还要复杂.

Actually, the answer was a bit more complicated than a simple formatting.

进程发送给终端的每个字符都可以看作是有限状态机(FSM)中的过渡.该FSM的状态大致与显示的句子和​​光标位置相对应,但是还有许多其他变量,例如终端的尺寸,输入的当前控制顺序*,终端模式(例如:VI模式/经典BASH控制台),等

Every character sent by the process to the terminal can be seen as a transition in a Finite State Machine (FSM). This FSM's state roughly corresponds to the sentence displayed and the cursor position, but there are many other variables such as the dimensions of the terminal, the current control sequence being inputted*, the terminal mode (ex: VI mode / classic BASH console), etc.

pexpect源代码.

要回答我的问题,没有可以将字符串格式化为终端显示内容的核心unix函数",因为该函数特定于呈现过程输出的终端并且您将不得不重写一个完整的终端以处理所有可能的字符和控制序列.

To answer my question, there is no core unix "function" that can format the string to what is displayed in the terminal, since such a function is specific to the terminal that renders process' output and you would have to rewrite a full terminal to handle every possible character and control sequence.

但是,我们可以自己实现一个简单的实现.我们需要使用初始状态定义一个FSM:

However we can implement a simple one ourselves. We need to define a FSM with an initial state :

  • 显示的字符串:"(空字符串)
  • 光标位置:0

和转换(输入字符):

  • 任何字母数字/空格字符:单独替换光标位置处的字符(或如果没有字符则添加)并增加光标位置
  • \x08十六进制代码:减少光标位置
  • any alphanumeric/space character: replaces the character at the cursor position by itself (or adds if there is none) and increments the cursor position
  • \x08 hex code: decrements the cursor position

并向其输入字符串.

def decode(input_string):

    # Initial state
    # String is stored as a list because
    # python forbids the modification of
    # a string
    displayed_string = [] 
    cursor_position = 0

    # Loop on our input (transitions sequence)
    for character in input_string:

        # Alphanumeric transition
        if str.isalnum(character) or str.isspace(character):
            # Add the character to the string
            displayed_string[cursor_position:cursor_position+1] = character 
            # Move the cursor forward
            cursor_position += 1

        # Backward transition
        elif character == "\x08":
            # Move the cursor backward
            cursor_position -= 1
        else:
            print("{} is not handled by this function".format(repr(character)))

    # We transform our "list" string back to a real string
    return "".join(displayed_string)

还有一个例子

>>> decode("test\x08 \x08")
tes 

关于控制序列的注意事项

ANSI控制序列是一组字符,它们充当终端的(显示/光标/终端模式/...)状态的过渡.可以将其视为对我们的FSM状态和过渡的改进,其中包含更多的子状态和子过渡.

Note about control sequences

An ANSI control sequence is a set of characters that act as a transition on the (display/cursor/terminal mode/...) state of the terminal. It can be seen as a refinement of our FSM state and transitions with more sub-states and sub-transitions.

例如:当您在经典的Unix终端(例如VT100)中按UP键时,您实际上输入了控制序列:ESC 0 A,其中ESC是十六进制代码\x1b. ESC转换为ESCAPE模式,并在A之后返回普通模式.

For example: when you press the UP key in a classic Unix terminal (such as the VT100), you actually enter the control sequence: ESC 0 A where ESC is hex code \x1b. ESC transitions to ESCAPE mode, and it returns to normal mode after A.

某些进程将此序列解释为垂直光标位置(VI)的移动,而其他进程则解释为历史记录中的向后移动(BASH):它完全取决于处理输入的程序.

Some processes interpret this sequence as a move of the vertical cursor position (VI), others as a move backward in the history (BASH) : it depends fully on the program that handles the input.

但是,可以在输出过程中使用相同的顺序,但是很可能会在屏幕上向上移动光标:这取决于终端的实现.

However, the same sequence can be used the output process but it will most likely move the cursor up in the screen : it depends on the terminal implementation.

此处.可以找到一个很好的ANSI控制序列列表.

A good list of ANSI control sequences is available here.

这篇关于将控制字符应用于字符串-Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆