为什么连接字符串比连接字符串运行起来快? [英] Why is concatenating strings running faster than joining them?

查看:69
本文介绍了为什么连接字符串比连接字符串运行起来快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,".join(iterable_of_strings)是连接字符串的首选方式,因为它允许进行优化,从而避免了不必要的次数将不可变对象重写到内存中的情况.

As I understand it "".join(iterable_of_strings) is the preferred way to concatenate strings because it allows for optimizations that avoid having to rewrite the immutable object to memory more times than necessary.

对于我来说,在表达式中添加字符串的速度要比加入它们可靠地快得多.

Adding strings inside of an expression is reliably running faster than joining them for moderately large number of operations for me.

我在加入笔记本电脑上用Python 3.3运行此代码时,获得了大约2.9-3.2秒的时间,而在添加运行此代码时获得了2.3-2.7的时间.我找不到很好的答案.有人可以解释可能会发生什么,或将我引导到好的资源吗?

I get about 2.9-3.2 seconds of time on joined and 2.3-2.7 on added running this code with Python 3.3 on my laptop. I couldn't find a good answer Googling this. Could someone explain what might be going on or direct me to a good resource?

import uuid
import time

class mock:
    def __init__(self):
        self.name = "foo"
        self.address = "address"
        self.age = "age"
        self.primarykey = uuid.uuid4()

data_list = [mock() for x in range(2000000)]

def added():
    my_dict_list = {}
    t = time.time()
    new_dict = { item.primarykey: item.name + item.address + item.age for item in data_list }
    print(time.time() - t)

def joined():
    my_dict_list = {}
    t = time.time()
    new_dict = { item.primarykey: ''.join([item.name, item.address, item.age]) for item in data_list }
    print(time.time() - t)

joined()
added()

推荐答案

您看到的时差来自创建要传递给join的列表.虽然可以通过使用元组来获得较小的加速,但是它仍然比仅包含几个短字符串的情况下仅与+串联要慢.

The time difference you're seeing comes from creating the list to be passed to join. And while you can get a small speedup from using a tuple instead, it's still going to be slower than just concatenating with + when there are only a few short strings.

如果您有一个可迭代的字符串开头,而不是一个以字符串为属性的对象,那将有所不同.然后,您可以直接在可迭代对象上调用join,而不必为每次调用都构建一个新对象.

It would be different if you had an iterable of strings to start with, rather than an object with strings as attributes. Then you could call join directly on the iterable, rather than needing to build a new one for each call.

这是我对timeit模块进行的一些测试:

Here's some testing I did with the timeit module:

import timeit

short_strings = ["foo", "bar", "baz"]
long_strings = [s*1000 for s in short_strings]

def concat(a, b, c):
    return a + b + c

def concat_from_list(lst):
    return lst[0] + lst[1] + lst[2]

def join(a, b, c):
    return "".join([a, b, c])

def join_tuple(a, b, c):
    return "".join((a, b, c))

def join_from_list(lst):
    return "".join(lst)

def test():
    print("Short strings")
    print("{:20}{}".format("concat:",
                           timeit.timeit(lambda: concat(*short_strings))))
    print("{:20}{}".format("concat_from_list:",
                           timeit.timeit(lambda: concat_from_list(short_strings))))
    print("{:20}{}".format("join:",
                           timeit.timeit(lambda: join(*short_strings))))
    print("{:20}{}".format("join_tuple:",
                           timeit.timeit(lambda: join_tuple(*short_strings))))
    print("{:20}{}\n".format("join_from_list:",
                             timeit.timeit(lambda: join_from_list(short_strings))))
    print("Long Strings")
    print("{:20}{}".format("concat:",
                           timeit.timeit(lambda: concat(*long_strings))))
    print("{:20}{}".format("concat_from_list:",
                           timeit.timeit(lambda: concat_from_list(long_strings))))
    print("{:20}{}".format("join:",
                           timeit.timeit(lambda: join(*long_strings))))
    print("{:20}{}".format("join_tuple:",
                           timeit.timeit(lambda: join_tuple(*long_strings))))
    print("{:20}{}".format("join_from_list:",
                           timeit.timeit(lambda: join_from_list(long_strings))))

输出:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
>>> test()
Short strings
concat:             0.5453461176251436
concat_from_list:   0.5185697357936024
join:               0.7099379456477868
join_tuple:         0.5900842397209949
join_from_list:     0.4177281794285359

Long Strings
concat:             2.002303591571888
concat_from_list:   1.8898819841869416
join:               1.5672863477837913
join_tuple:         1.4343144915087596
join_from_list:     1.231374639083505

因此,从现有列表中加入总是最快的.如果单个项目很短,则使用+进行连接的速度会更快,但是对于较长的字符串,它总是最慢的.我怀疑concatconcat_from_list之间显示的差异来自测试代码中函数调用中列表的解压缩.

So, joining from an already existing list is always fastest. Concatenating with + is faster for individual items if they are short, but for long strings it is always slowest. I suspect the difference shown between concat and concat_from_list comes from the unpacking of the lists in the function call in the test code.

这篇关于为什么连接字符串比连接字符串运行起来快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆