为什么在执行`a ='python'`时python插入字符串,而源代码未显示该字符串? [英] Why and where python interned strings when executing `a = 'python'` while the source code does not show that?

查看:142
本文介绍了为什么在执行`a ='python'`时python插入字符串,而源代码未显示该字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习在字符串对象的实现中使用python的内部机制.但是仅在PyObject *PyString_FromString(const char *str)PyObject *PyString_FromStringAndSize(const char *str, Py_ssize_t size) python内联字符串中,当其大小为0或1时.

I am trying to learn the intern mechanism of python using in the implementation of string object. But in both PyObject *PyString_FromString(const char *str)andPyObject *PyString_FromStringAndSize(const char *str, Py_ssize_t size) python interned strings only when its size is 0 or 1.

PyObject *
PyString_FromString(const char *str)
{
    fprintf(stdout, "creating %s\n", str);------------[1]
    //...
    //creating...
    /* share short strings */
    if (size == 0) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        nullstring = op;
        Py_INCREF(op);
    } else if (size == 1) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        characters[*str & UCHAR_MAX] = op;
        Py_INCREF(op);
    }
    return (PyObject *) op;
}

但是对于像a ='python'这样的较长字符串,如果我修改了string_print以打印地址,则该地址与另一个字符串变量b = 'python相同.并在上面标记为[1]的行上,当python创建一个字符串对象时显示一个日志,该对象显示在执行a ='python'时创建了多个字符串,而没有'python'.

But for longer strings like a ='python', if I modified the string_print to print the address, it is identical to the one of another string varable b = 'python. And at the line marked as [1] above, I print a piece of log when python creating a string object showing multiple strings are created when executing a ='python' just without 'python'.

>>> a = 'python'
creating stdin
creating stdin
string and size creating (null)
string and size creating a = 'python'
?
creating a
string and size creating (null)
string and size creating (null)
creating __main__
string and size creating (null)
string and size creating (null)
creating <stdin>
string and size creating d
creating __lltrace__
creating stdout
[26691 refs]
creating ps1
creating ps2

那么字符串'python'是在哪里创建和实习的?

So where is string 'python' created and interned?

更新1

请参阅@Daniel Darabos的评论,以获得更好的解释.问这个问题是一种更容易理解的方式.

Plz refer to the comment by @Daniel Darabos for a better interpretation. It is a more understandable way to ask this question.

以下是添加日志打印命令后PyString_InternInPlace的输出.

The following is the output of PyString_InternInPlace after adding a log print command.

PyString_InternInPlace(PyObject **p)
{
    register PyStringObject *s = (PyStringObject *)(*p);
    fprintf(stdout, "Interning ");
    PyObject_Print(s, stdout, 0);
    fprintf(stdout, "\n");
    //...
}
>>> x = 'python'
Interning 'cp936'
Interning 'x'
Interning 'cp936'
Interning 'x'
Interning 'python'
[26706 refs]

推荐答案

编译器将字符串文字转换为字符串对象.至少在Py2.7中,执行此功能的功能是PyString_DecodeEscape,而您尚未说明要使用的版本.

The string literal is turned into a string object by the compiler. The function that does that is PyString_DecodeEscape, at least in Py2.7, you haven't said what version you are working with.

更新:

编译器会在编译过程中插入一些字符串,但是发生时会非常混乱.该字符串只需要包含标识符正确的字符即可:

The compiler interns some strings during compilation, but it is very confusing when it happens. The string needs to have only identifier-ok characters:

>>> a = 'python'
>>> b = 'python'
>>> a is b
True
>>> a = 'python!'
>>> b = 'python!'
>>> a is b
False

即使在函数中,也可以插入字符串文字:

Even in functions, string literals can be interned:

>>> def f():
...   return 'python'
...
>>> def g():
...   return 'python'
...
>>> f() is g()
True

但如果他们有有趣的角色,则不会:

But not if they have funny characters:

>>> def f():
...   return 'python!'
...
>>> def g():
...   return 'python!'
...
>>> f() is g()
False

如果我返回一对字符串,它们都没有被拘禁,我不知道为什么:

And if I return a pair of strings, none of them are interned, I don't know why:

>>> def f():
...   return 'python', 'python!'
...
>>> def g():
...   return 'python', 'python!'
...
>>> a, b = f()
>>> c, d = g()
>>> a is c
False
>>> a == c
True
>>> b is d
False
>>> b == d
True

故事的寓意:实习是依赖于实现的优化,它取决于许多因素.了解它是如何工作的可能很有趣,但决不要以任何特定方式依赖它.

Moral of the story: interning is an implementation-dependent optimization that depends on many factors. It can be interesting to understand how it works, but never depend on it working any particular way.

这篇关于为什么在执行`a ='python'`时python插入字符串,而源代码未显示该字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆