Python中未分配的字符串如何在内存中具有地址? [英] How can a non-assigned string in Python have an address in memory?

查看:120
本文介绍了Python中未分配的字符串如何在内存中具有地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释吗?因此,我一直在使用python中的id()命令,并遇到了这个问题:

Can someone explain this to me? So I've been playing with the id() command in python and came across this:

>>> id('cat')
5181152
>>> a = 'cat'
>>> b = 'cat'
>>> id(a)
5181152
>>> id(b)
5181152

这对我来说是有意义的,除了一部分:字符串"cat"在将其分配给变量之前在内存中具有地址.我可能只是不了解内存寻址的工作原理,但是有人可以向我解释一下还是至少告诉我我应该阅读内存寻址?

This makes some sense to me except for one part: The string 'cat' has an address in memory before I assign it to a variable. I probably just don't understand how memory addressing works but can someone explain this to me or at least tell me that I should read up on memory addressing?

这一切都很好,但这进一步使我感到困惑:

So that is all well and good but this confused me further:

>>> a = a[0:2]+'t'
>>> a
'cat'
>>> id(a)
39964224
>>> id('cat')
5181152

这让我感到很奇怪,因为'cat'是一个地址为5181152的字符串,但是新的 a 具有一个不同的地址.因此,如果内存中有两个'cat'字符串,为什么不为 id('cat')打印两个地址?我最后的想法是,串联与地址的更改有关,所以我尝试了这一点:

This struck me as weird because 'cat' is a string with an address of 5181152 but the new a has a different address. So if there are two 'cat' strings in memory why aren't two addresses printed for id('cat')? My last thought was that the concatenation had something to do with the change in address so I tried this:

>>> id(b[0:2]+'t')
39921024
>>> b = b[0:2]+'t'
>>> b
'cat'
>>> id(b)
40000896

我会预测ID是相同的,但事实并非如此.有想法吗?

I would have predicted the IDs to be the same but that was not the case. Thoughts?

推荐答案

Python相当积极地重用了字符串文字.这样做的规则取决于实现,但是CPython使用了我知道的两个规则:

Python reuses string literals fairly aggressively. The rules by which it does so are implementation-dependent, but CPython uses two that I'm aware of:

  • 仅包含在Python标识符中有效的字符的字符串会被 interned 表示,这意味着它们将存储在一个大表中,并在出现它们的任何地方重复使用.因此,无论您在何处使用"cat",它都始终引用相同的字符串对象.
  • 同一代码块中的字符串文字,无论其内容和长度如何,都将被重用.如果将整个葛底斯堡地址的字符串文字放入一个函数中两次,则两次都是相同的字符串对象.在单独的函数中,它们是不同的对象: def foo(): return "pack my box with five dozen liquor jugs" def bar(): return "pack my box with five dozen liquor jugs" assert foo() is bar() # AssertionError
  • Strings that contain only characters valid in Python identifiers are interned, which means they are stored in a big table and reused wherever they occur. So, no matter where you use "cat", it always refers to the same string object.
  • String literals in the same code block are reused regardless of their content and length. If you put a string literal of the entire Gettysburg Address in a function, twice, it's the same string object both times. In separate functions, they are different objects: def foo(): return "pack my box with five dozen liquor jugs" def bar(): return "pack my box with five dozen liquor jugs" assert foo() is bar() # AssertionError

这两种优化都是在编译时(即生成字节码时)完成的.

Both optimizations are done at compile time (that is, when the bytecode is generated).

另一方面,类似chr(99) + chr(97) + chr(116)的是字符串 expression ,其结果为字符串"cat".在像Python这样的动态语言中,其值在编译时不知道(chr()是内置函数,但您可能已将其重新分配了),因此通常不会对其进行检查.因此,其id()"cat"不同.但是,您可以使用intern()函数强制强制插入字符串.因此:

On the other hand, something like chr(99) + chr(97) + chr(116) is a string expression that evaluates to the string "cat". In a dynamic language like Python, its value can't be known at compile time (chr() is a built-in function, but you might have reassigned it) so it normally isn't interned. Thus its id() is different from that of "cat". However, you can force a string to be interned using the intern() function. Thus:

id(intern(chr(99) + chr(97) + chr(116))) == id("cat")   # True

正如其他人所提到的,因为字符串是不可变的,所以可以进行实习.换句话说,不可能将"cat"更改为"dog".您必须生成一个新的字符串对象,这意味着不存在指向相同字符串的其他名称受到影响的危险.

As others have mentioned, interning is possible because strings are immutable. It isn't possible to change "cat" to "dog", in other words. You have to generate a new string object, which means that there's no danger that other names pointing to the same string will be affected.

顺便说一句,Python还在编译时将仅包含常量(如"c" + "a" + "t")的表达式转换为常量,如下面的反汇编所示.将按照上述规则对它们进行优化,以指向相同的字符串对象.

Just as an aside, Python also converts expressions containing only constants (like "c" + "a" + "t") to constants at compile time, as the below disassembly shows. These will be optimized to point to identical string objects per the rules above.

>>> def foo(): "c" + "a" + "t"
...
>>> from dis import dis; dis(foo)
  1           0 LOAD_CONST               5 ('cat')
              3 POP_TOP
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE

这篇关于Python中未分配的字符串如何在内存中具有地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆