Python 字符串格式化:'%' 比 'format' 函数更有效吗? [英] Python string formatting: is '%' more efficient than 'format' function?

查看:72
本文介绍了Python 字符串格式化:'%' 比 'format' 函数更有效吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较不同的变量在 Python 中构建一个字符串:

  • 使用+进行连接(简称加号")
  • 使用%
  • 使用"".join(list)
  • 使用format函数
  • 使用"{0.<attribute>}".format(object)

我比较了 3 种场景

  • 2 个变量的字符串
  • 4 个变量的字符串
  • 4 个变量的字符串,每个变量使用两次

我每次测量了 100 万次操作,平均执行了 6 次测量.我想出了以下时间安排:

在每种情况下,我得出以下结论

  • 串联似乎是最快的方法之一
  • 使用 % 格式化比使用 format 函数格式化要快得多

我相信 format% 好得多(例如在 这个问题) 和 % 几乎被弃用了.

因此我有几个问题:

  1. % 真的比 format 快吗?
  2. 如果是,为什么会这样?
  3. 为什么 "{} {}".format(var1, var2)"{0.attribute1} {0.attribute2}".format(对象)?


作为参考,我使用以下代码来测量不同的时间.

导入时间def 计时(f, n, show, *args):如果显示:打印 f.__name__ + ":\t",r = 范围(n/10)t1 = time.clock()对于 r 中的 i:f(*args);f(*args);f(*args);f(*args);f(*args);f(*args);f(*args);f(*args);f(*args);f(*参数)t2 = time.clock()计时 = 回合(t2-t1, 3)如果显示:打印时间返回时间类值(对象):def __init__(self, a, b, c="", d=""):self.a = a自我.b = bself.c = c自我.d = ddef test_plus(a, b):返回一个 +-"+ bdef test_percent(a, b):返回%s-%s"% (a, b)def test_join(a, b):返回 '​​'.join([a, '-', b])def test_format(a, b):返回{}-{}".format(a, b)def test_formatC(val):返回{0.a}-{0.b}".format(val)def test_plus_long(a, b, c, d):返回一个 +-"+ b + "-";+ c + "-";+ ddef test_percent_long(a, b, c, d):返回%s-%s-%s-%s"% (A B C D)def test_join_long(a, b, c, d):return ''.join([a, '-', b, '-', c, '-', d])def test_format_long(a, b, c, d):返回{0}-{1}-{2}-{3}".format(a, b, c, d)def test_formatC_long(val):返回{0.a}-{0.b}-{0.c}-{0.d}".format(val)def test_plus_long2(a, b, c, d):返回一个 +-"+ b + "-";+ c + "-";+ d + -"+ a + "-";+ b + "-";+ c + "-";+ ddef test_percent_long2(a, b, c, d):返回%s-%s-%s-%s-%s-%s-%s-%s"% (a, b, c, d, a, b, c, d)def test_join_long2(a, b, c, d):return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d])def test_format_long2(a, b, c, d):返回{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d)def test_formatC_long2(val):返回 "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val)def test_plus_superlong(lst):字符串 = ";因为我在 lst:字符串 += str(i)返回字符串def test_join_superlong(lst):return "".join([str(i) for i in lst])定义平均值(数字):return float(sum(numbers))/max(len(numbers), 1)nb_times = int(1e6)n = xrange(5)lst_numbers = xrange(1000)从集合导入 defaultdict指标= defaultdict(列表)list_functions = [test_plus、test_percent、test_join、test_format、test_formatC、test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long,test_plus_long2、test_percent_long2、test_join_long2、test_format_long2、test_formatC_long2、# test_plus_superlong, test_join_superlong,]val = 值(123"、456"、789"、0ab")对于 n 中的 i:对于 list_functions 中的 f:打印.",name = f.__name__如果格式C"名义上:t = 时间(f,nb_times,假,val)elif '_long' 名称:t = 时间(f,nb_times,假,123",456",789",0ab")elif '_superlong' 名称:t = 时间(f,nb_times,假,lst_numbers)别的:t = 时间(f,nb_times,假,123",456")指标[名称].append(t)# 获取平均值打印\n===平均时间===";对于 list_functions 中的 f:name = f.__name__计时 = 指标 [名称]打印 "{:>20}:\t{:0.5f}".format(name, mean(timings))

解决方案

  1. 是的,% 字符串格式化比 .format 方法快
  2. 很可能(这可能有更好的解释),因为 % 是一种语法符号(因此执行速度快),而 .format 至少涉及 一个额外的方法调用
  3. 因为属性值访问还涉及额外的方法调用,即.__getattr__

我使用 timeit 对各种格式化方法进行了稍微好一点的分析(在 Python 3.8.2 上),结果如下(用 BeautifulTable) -

<前>+-------+-------+-------+-------+-------+-------+------+|输入\ num_vars |1 |2 |5 |10 |50 |250 |+-------+-------+-------+-------+-------+-------+------+|f_str_str |0.056 |0.063 |0.115 |0.173 |0.754 |3.717 |+-------+-------+-------+-------+-------+-------+------+|f_str_int |0.055 |0.148 |0.354 |0.656 |3.186 |15.747 |+-------+-------+-------+-------+-------+-------+------+|concat_str |0.012 |0.044 |0.169 |0.333 |1.888 |10.231 |+-------+-------+-------+-------+-------+-------+------+|pct_s_str |0.091 |0.114 |0.182 |0.313 |1.213 |6.019 |+-------+-------+-------+-------+-------+-------+------+|pct_s_int |0.09 |0.141 |0.248 |0.479 |2.179 |10.768 |+-------+-------+-------+-------+-------+-------+------+|dot_format_str |0.143 |0.157 |0.251 |0.461 |1.745 |8.259 |+-------+-------+-------+-------+-------+-------+------+|dot_format_int |0.141 |0.192 |0.333 |0.62 |2.735 |13.298 |+-------+-------+-------+-------+-------+-------+------+|dot_format2_str |0.159 |0.195 |0.33 |0.634 |3.494 |18.975 |+-------+-------+-------+-------+-------+-------+------+|dot_format2_int |0.158 |0.227 |0.422 |0.762 |4.337 |25.498 |+-------+-------+-------+-------+-------+-------+------+

尾随 _str &_int 表示对各自的值类型进行的操作.

请注意单个变量的 concat_str 结果本质上只是字符串本身,因此不应真正考虑它.

我获得结果的设置 -

from timeit import timeitfrom beautifultable import BeautifulTable # pip install beautifultable次 = {}对于 (250, 50, 10, 5, 2, 1) 中的 num_vars:f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"# "f'{x0}{x1}'"concat = '+'.join([f'x{i}' for i in range(num_vars)])#'x0+x1'pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'# '"%s%s" % (x0,x1)'dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'# '"{}{}".format(x0,x1)'dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'# '"{0}{1}".format(x0,x1)'vars = ','.join([f'x{i}' for i in range(num_vars)])vals_str = tuple(map(str, range(num_vars))) 如果 num_vars >1 其他 '0'setup_str = f'{vars} = {vals_str}'# "x0,x1 = ('0', '1')"vals_int = tuple(range(num_vars)) 如果 num_vars >1 其他 0setup_int = f'{vars} = {vals_int}'# 'x0,x1 = (0, 1)'次数[num_vars] = {'f_str_str': timeit(f_str, setup_str),'f_str_int':时间(f_str,setup_int),'concat_str': timeit(concat, setup_str),# 'concat_int': timeit(concat, setup_int), # 这将是求和,而不是连续'pct_s_str':时间(pct_s,setup_str),'pct_s_int':时间(pct_s,setup_int),'dot_format_str': timeit(dot_format, setup_str),'dot_format_int': timeit(dot_format, setup_int),'dot_format2_str': timeit(dot_format2, setup_str),'dot_format2_int': timeit(dot_format2, setup_int),}table = BeautifulTable()table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))# 订单被保留,所以我没有太担心对于键入('f_str_str'、'f_str_int'、'concat_str'、'pct_s_str'、'pct_s_int'、'dot_format_str'、'dot_format_int'、'dot_format2_str'、'dot_format2_int'):table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])打印(表)

由于 timeit 的最大参数 (255) 限制,我无法超越 num_vars=250.

tl;dr - Python 字符串格式化性能:f-strings 更快、更优雅,但有时​​(由于某些 实施限制(仅限 Py3.6+),您可能必须使用其他格式根据需要选择.

I wanted to compare different to build a string in Python from different variables:

  • using + to concatenate (referred to as 'plus')
  • using %
  • using "".join(list)
  • using format function
  • using "{0.<attribute>}".format(object)

I compared for 3 types of scenari

  • string with 2 variables
  • string with 4 variables
  • string with 4 variables, each used twice

I measured 1 million operations of each time and performed an average over 6 measures. I came up with the following timings:

In each scenario, I came up with the following conclusion

  • Concatenation seems to be one of the fastest method
  • Formatting using % is much faster than formatting with format function

I believe format is much better than % (e.g. in this question) and % was almost deprecated.

I have therefore several questions:

  1. Is % really faster than format?
  2. If so, why is that?
  3. Why is "{} {}".format(var1, var2) more efficient than "{0.attribute1} {0.attribute2}".format(object)?


For reference, I used the following code to measure the different timings.

import time
def timing(f, n, show, *args):
    if show: print f.__name__ + ":\t",
    r = range(n/10)
    t1 = time.clock()
    for i in r:
        f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args)
    t2 = time.clock()
    timing = round(t2-t1, 3)
    if show: print timing
    return timing
    

class values(object):
    def __init__(self, a, b, c="", d=""):
        self.a = a
        self.b = b
        self.c = c
        self.d = d

    
def test_plus(a, b):
    return a + "-" + b

def test_percent(a, b):
    return "%s-%s" % (a, b)

def test_join(a, b):
    return ''.join([a, '-', b])
        
def test_format(a, b):
    return "{}-{}".format(a, b)

def test_formatC(val):
    return "{0.a}-{0.b}".format(val)

    
def test_plus_long(a, b, c, d):
    return a + "-" + b + "-" + c + "-" + d

def test_percent_long(a, b, c, d):
    return "%s-%s-%s-%s" % (a, b, c, d)
        
def test_join_long(a, b, c, d):
    return ''.join([a, '-', b, '-', c, '-', d])
    
def test_format_long(a, b, c, d):
    return "{0}-{1}-{2}-{3}".format(a, b, c, d)

def test_formatC_long(val):
    return "{0.a}-{0.b}-{0.c}-{0.d}".format(val)

    
def test_plus_long2(a, b, c, d):
    return a + "-" + b + "-" + c + "-" + d + "-" + a + "-" + b + "-" + c + "-" + d

def test_percent_long2(a, b, c, d):
    return "%s-%s-%s-%s-%s-%s-%s-%s" % (a, b, c, d, a, b, c, d)
    
def test_join_long2(a, b, c, d):
    return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d])
            
def test_format_long2(a, b, c, d):
    return "{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d)

def test_formatC_long2(val):
    return "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val)


def test_plus_superlong(lst):
    string = ""
    for i in lst:
        string += str(i)
    return string
    

def test_join_superlong(lst):
    return "".join([str(i) for i in lst])
    

def mean(numbers):
    return float(sum(numbers)) / max(len(numbers), 1)
        

nb_times = int(1e6)
n = xrange(5)
lst_numbers = xrange(1000)
from collections import defaultdict
metrics = defaultdict(list)
list_functions = [
    test_plus, test_percent, test_join, test_format, test_formatC,
    test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long,
    test_plus_long2, test_percent_long2, test_join_long2, test_format_long2, test_formatC_long2,
    # test_plus_superlong, test_join_superlong,
]
val = values("123", "456", "789", "0ab")
for i in n:
    for f in list_functions:
        print ".",
        name = f.__name__
        if "formatC" in name:
            t = timing(f, nb_times, False, val)
        elif '_long' in name:
            t = timing(f, nb_times, False, "123", "456", "789", "0ab")
        elif '_superlong' in name:
            t = timing(f, nb_times, False, lst_numbers)
        else:
            t = timing(f, nb_times, False, "123", "456")
        metrics[name].append(t) 

# Get Average
print "\n===AVERAGE OF TIMINGS==="
for f in list_functions:
    name = f.__name__
    timings = metrics[name]
    print "{:>20}:\t{:0.5f}".format(name, mean(timings))

解决方案

  1. Yes, % string formatting is faster than the .format method
  2. most likely (this may have a much better explanation) due to % being a syntactical notation (hence fast execution), whereas .format involves at least one extra method call
  3. because attribute value access also involves an extra method call, viz. __getattr__

I ran a slightly better analysis (on Python 3.8.2) using timeit of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -

+-----------------+-------+-------+-------+-------+-------+--------+
| Type \ num_vars |   1   |   2   |   5   |  10   |  50   |  250   |
+-----------------+-------+-------+-------+-------+-------+--------+
|    f_str_str    | 0.056 | 0.063 | 0.115 | 0.173 | 0.754 | 3.717  |
+-----------------+-------+-------+-------+-------+-------+--------+
|    f_str_int    | 0.055 | 0.148 | 0.354 | 0.656 | 3.186 | 15.747 |
+-----------------+-------+-------+-------+-------+-------+--------+
|   concat_str    | 0.012 | 0.044 | 0.169 | 0.333 | 1.888 | 10.231 |
+-----------------+-------+-------+-------+-------+-------+--------+
|    pct_s_str    | 0.091 | 0.114 | 0.182 | 0.313 | 1.213 | 6.019  |
+-----------------+-------+-------+-------+-------+-------+--------+
|    pct_s_int    | 0.09  | 0.141 | 0.248 | 0.479 | 2.179 | 10.768 |
+-----------------+-------+-------+-------+-------+-------+--------+
| dot_format_str  | 0.143 | 0.157 | 0.251 | 0.461 | 1.745 | 8.259  |
+-----------------+-------+-------+-------+-------+-------+--------+
| dot_format_int  | 0.141 | 0.192 | 0.333 | 0.62  | 2.735 | 13.298 |
+-----------------+-------+-------+-------+-------+-------+--------+
| dot_format2_str | 0.159 | 0.195 | 0.33  | 0.634 | 3.494 | 18.975 |
+-----------------+-------+-------+-------+-------+-------+--------+
| dot_format2_int | 0.158 | 0.227 | 0.422 | 0.762 | 4.337 | 25.498 |
+-----------------+-------+-------+-------+-------+-------+--------+

The trailing _str & _int represent the operation was carried out on respective value types.

Kindly note that the concat_str result for a single variable is essentially just the string itself, so it shouldn't really be considered.

My setup for arriving at the results -

from timeit import timeit
from beautifultable import BeautifulTable  # pip install beautifultable

times = {}

for num_vars in (250, 50, 10, 5, 2, 1):
    f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"
    # "f'{x0}{x1}'"
    concat = '+'.join([f'x{i}' for i in range(num_vars)])
    # 'x0+x1'
    pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"%s%s" % (x0,x1)'
    dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"{}{}".format(x0,x1)'
    dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
    # '"{0}{1}".format(x0,x1)'

    vars = ','.join([f'x{i}' for i in range(num_vars)])
    vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'
    setup_str = f'{vars} = {vals_str}'
    # "x0,x1 = ('0', '1')"
    vals_int = tuple(range(num_vars)) if num_vars > 1 else 0
    setup_int = f'{vars} = {vals_int}'
    # 'x0,x1 = (0, 1)'

    times[num_vars] = {
        'f_str_str': timeit(f_str, setup_str),
        'f_str_int': timeit(f_str, setup_int),
        'concat_str': timeit(concat, setup_str),
        # 'concat_int': timeit(concat, setup_int), # this will be summation, not concat
        'pct_s_str': timeit(pct_s, setup_str),
        'pct_s_int': timeit(pct_s, setup_int),
        'dot_format_str': timeit(dot_format, setup_str),
        'dot_format_int': timeit(dot_format, setup_int),
        'dot_format2_str': timeit(dot_format2, setup_str),
        'dot_format2_int': timeit(dot_format2, setup_int),
    }

table = BeautifulTable()
table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))
# Order is preserved, so I didn't worry much
for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):
    table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])
print(table)

I couldn't go beyond num_vars=250 because of the max arguments (255) limit with timeit.

tl;dr - Python string formatting performance : f-strings are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.

这篇关于Python 字符串格式化:'%' 比 'format' 函数更有效吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆