从一维numpy数组生成定界字符串的最快方法 [英] Fastest way to generate delimited string from 1d numpy array

查看:97
本文介绍了从一维numpy数组生成定界字符串的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,需要将许多大的一维浮点数小数组转换成定界字符串.我发现此操作相对于程序中的数学操作而言相当慢,并且想知道是否有加速它的方法.例如,考虑以下循环,该循环在numpy数组中获取100,000个随机数,并将每个数组连接成一个逗号分隔的字符串.

I have a program which needs to turn many large one-dimensional numpy arrays of floats into delimited strings. I am finding this operation quite slow relative to the mathematical operations in my program and am wondering if there is a way to speed it up. For example, consider the following loop, which takes 100,000 random numbers in a numpy array and joins each array into a comma-delimited string.

import numpy as np
x = np.random.randn(100000)
for i in range(100):
    ",".join(map(str, x))

此循环大约需要20秒(总计,而不是每个周期).相反,请考虑完成100个周期的元素逐乘(x * x)之类的任务需要不到1/10秒的时间.显然,字符串连接操作会产生很大的性能瓶颈;在我的实际应用程序中,它将主导整个运行时.这让我感到奇怪,有没有比,".join(map(str(x,x)))更快的方法?由于map()几乎是所有处理时间都在此发生,因此这归结为以下问题:是否有更快的方法将大量数字转换为字符串.

This loop takes about 20 seconds to complete (total, not each cycle). In contrast, consider that 100 cycles of something like elementwise multiplication (x*x) would take than one 1/10 of a second to complete. Clearly the string join operation creates a large performance bottleneck; in my actual application it will dominate total runtime. This makes me wonder, is there a faster way than ",".join(map(str, x))? Since map() is where almost all the processing time occurs, this comes down to the question of whether there a faster to way convert a very large number of numbers to strings.

推荐答案

关于Python中各种字符串连接技术的性能非常出色的文章:

Very good writeup on the performance of various string concatenation techniques in Python: http://www.skymind.com/~ocrow/python_string/

我对后一种方法的效果与它们一样好感到惊讶,但是看起来您肯定可以在其中找到比您在这里做的事更好的事情.

I'm a little surprised that some of the latter approaches perform as well as they do, but looks like you can certainly find something there that will work better for you than what you're doing there.

网站上提到的最快方法

方法6:列表理解

def method6():
  return ''.join([`num` for num in xrange(loop_count)])

此方法最短.我会惊讶的告诉你 也是最快的.它非常紧凑,也很漂亮 可以理解的.使用列表理解来创建数字列表 然后将他们一起加入.没有比这更简单的了.这 实际上只是方法4的缩写版本,它消耗了 几乎相同的内存量.虽然更快,因为我们 不必每次都绕过list.append()函数 循环.

This method is the shortest. I'll spoil the surprise and tell you it's also the fastest. It's extremely compact, and also pretty understandable. Create a list of numbers using a list comprehension and then join them all together. Couldn't be simpler than that. This is really just an abbreviated version of Method 4, and it consumes pretty much the same amount of memory. It's faster though because we don't have to call the list.append() function each time round the loop.

这篇关于从一维numpy数组生成定界字符串的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆