Python字符串文字连接 [英] Python string literal concatenation

查看:47
本文介绍了Python字符串文字连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用以下语法创建多行字符串:

string = str("一些字符"更多字符")

这将产生以下字符串:

一些字符 一些更多的字符"

Python 是将这两个单独的字符串连接起来还是编辑器/编译器将它们视为单个字符串?

P.s:我只是想了解内部结构.我知道还有其他方法可以声明或创建多行字符串.

解决方案

阅读 而将解析树转换为抽象语法树的代码在 Python/ast.c.

此信息适用于 Python 3.5,我很确定除非您使用的是非常旧的版本(<2.5) 功能和位置应该相似.

此外,如果您对 python 遵循的整个编译步骤感兴趣,核心贡献者之一 Brett Cannon 在视频中提供了一个很好的温和介绍从源代码到代码:CPython 编译器的工作原理.

I can create a multi-line string using this syntax:

string = str("Some chars "
         "Some more chars")

This will produce the following string:

"Some chars Some more chars"

Is Python joining these two separate strings or is the editor/compiler treating them as a single string?

P.s: I just want to understand the internals. I know there are other ways to declare or create multi-line strings.

解决方案

Read the reference manual, it's in there. Specifically:

Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings,

(emphasis mine)

This is why:

string = str("Some chars "
         "Some more chars")

is exactly the same as: str("Some chars Some more chars").

This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with str above) et-cetera.

The only caveat is when a string literal is not contained between one of the grouping delimiters (), {} or [] but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:

string = "Some chars " \
         "Some more chars"

Of course, concatenation of strings on the same physical line does not require the backslash. (string = "Hello " "World" is just fine)


Is Python joining these two separate strings or is the editor/compiler treating them as a single string?

Python is, now when exactly does Python do this is where things get interesting.

From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (LL(1) Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).

You can get a view of the parsed tree via the parser module:

import parser

expr = """
       str("Hello "
           "World")
"""
pexpr = parser.expr(expr)
parser.st2list(pexpr)

This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in expr:

-- rest snipped for brevity --

          [322,
             [323,
                [3, '"hello"'],
                [3, '"world"']]]]]]]]]]]]]]]]]],

-- rest snipped for brevity --

The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in Lib/symbol.py and Lib/token.py respectively.

As you can see in the snipped version I added, you have two different entries corresponding to the two different str literals in the expression parsed.

Next, we can view the output of the AST tree produced by the previous expression via the ast module provided in the Standard Library:

p = ast.parse(expr)
ast.dump(p)

# this prints out the following:
"Module(body = [Expr(value = Call(func = Name(id = 'str', ctx = Load()), args = [Str(s = 'hello world')], keywords = []))])"

The output is more user friendly in this case; you can see that the args for the function call is the single concatenated string Hello World.

In addition, I also stumbled upon a cool module that generates a visualization of the tree for ast nodes. Using it, the output of the expression expr is visualized like this:

                                           

Image cropped to show only the relevant part for the expression.

As you can see, in the terminal leaf node we have a single str object, the joined string for "Hello " and "World", i.e "Hello World".


If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at Parser/pgen.c while the code transforming the parse tree into an Abstract Syntax Tree is in Python/ast.c.

This information is for Python 3.5 and I'm pretty sure that unless you're using some really old version (< 2.5) the functionality and locations should be similar.

Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython's Compiler Works.

这篇关于Python字符串文字连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆