获取用双引号的Python海峡再版 [英] Get str repr with double quotes Python

查看:146
本文介绍了获取用双引号的Python海峡再版的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用一个小的Python脚本生成,将在C头被用于一些二进制数据。

I'm using a small Python script to generate some binary data that will be used in a C header.

此数据应被声明为的char [] ,这将是很好的,如果它可以连接codeD作为字符串(与相关的转义序列当他们不在ASCII字符打印的范围内),以保持头比使用十进制或十六进制的数组编码更紧凑。

This data should be declared as a char[], and it will be nice if it could be encoded as a string (with the pertinent escape sequences when they are not in the range of ASCII printable chars) to keep the header more compact than with a decimal or hexadecimal array encoding.

问题是,当我打印字符串形式的再版,它是由单引号分隔,和C不喜欢这一点。天真,解决方案是:

The problem is that when I print the repr of a Python string, it is delimited by single quotes, and C doesn't like that. The naive solution is to do:

'"%s"'%repr(data)[1:-1]

但是,当数据字节之一恰好是一个双引号不工作,所以我需要他们太转义。

but that doesn't work when one of the bytes in the data happens to be a double quote, so I'd need them to be escaped too.

我想一个简单的更换('','\\\\')可以做的工作,但也许有一个更好,更Python的解决方案在那里。

I think a simple replace('"', '\\"') could do the job, but maybe there's a better, more pythonic solution out there.

加分

这将是方便的太至在大约80个字符行的数据拆分,但同样的<一个简单的方法href=\"http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python\">splitting块中的源字符串大小80将无法工作,因为每个非打印字符需要转义序列2或3个字符。拆分为80块列表中的之后的获得再版将没有帮助,因为它可以分化转义序列。

It would be convenient too to split the data in lines of approximately 80 characters, but again the simple approach of splitting the source string in chunks of size 80 won't work, as each non printable character takes 2 or 3 characters in the escape sequence. Splitting the list in chunks of 80 after getting the repr won't help either, as it could divide escape sequence.

有什么建议?

推荐答案

再版()是不是你想要的。有一个根本性的问题:再版()可以使用,可以评价为Python来产生字符串字符串的任何重新presentation。这意味着,从理论上讲,它可能决定使用任何数量的其他构建体这将不是在C是有效的,例如长串的

repr() isn't what you want. There's a fundamental problem: repr() can use any representation of the string that can be evaluated as Python to produce the string. That means, in theory, that it might decide to use any number of other constructs which wouldn't be valid in C, such as """long strings""".

这code可能是正确的方向。我用140回绕的默认,这是2009年一个合理的值,但如果你真的想换你code至80列,只是改变它。

This code is probably the right direction. I've used a default of wrapping at 140, which is a sensible value for 2009, but if you really want to wrap your code to 80 columns, just change it.

如果UNI code =真,则输出L宽的字符串,它可以存储的Uni code有意义逃脱。或者,你可能要统一code字符转换为UTF-8和输出他们逃脱了,这取决于你使用他们的程序。

If unicode=True, it outputs a L"wide" string, which can store Unicode escapes meaningfully. Alternatively, you might want to convert Unicode characters to UTF-8 and output them escaped, depending on the program you're using them in.

def string_to_c(s, max_length = 140, unicode=False):
    ret = []

    # Try to split on whitespace, not in the middle of a word.
    split_at_space_pos = max_length - 10
    if split_at_space_pos < 10:
        split_at_space_pos = None

    position = 0
    if unicode:
        position += 1
        ret.append('L')

    ret.append('"')
    position += 1
    for c in s:
        newline = False
        if c == "\n":
            to_add = "\\\n"
            newline = True
        elif ord(c) < 32 or 0x80 <= ord(c) <= 0xff:
            to_add = "\\x%02x" % ord(c)
        elif ord(c) > 0xff:
            if not unicode:
                raise ValueError, "string contains unicode character but unicode=False"
            to_add = "\\u%04x" % ord(c)
        elif "\\\"".find(c) != -1:
            to_add = "\\%c" % c
        else:
            to_add = c

        ret.append(to_add)
        position += len(to_add)
        if newline:
            position = 0

        if split_at_space_pos is not None and position >= split_at_space_pos and " \t".find(c) != -1:
            ret.append("\\\n")
            position = 0
        elif position >= max_length:
            ret.append("\\\n")
            position = 0

    ret.append('"')

    return "".join(ret)

print string_to_c("testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing", max_length = 20)
print string_to_c("Escapes: \"quote\" \\backslash\\ \x00 \x1f testing \x80 \xff")
print string_to_c(u"Unicode: \u1234", unicode=True)
print string_to_c("""New
lines""")

这篇关于获取用双引号的Python海峡再版的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆