groupby()似乎很慢 [英] groupby() seems slow

查看:91
本文介绍了groupby()似乎很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以非常简单的方式应用groupby()来分割一些数据,

但是当我对另一种方法进行计时时,需要两倍的时间。在groupby()代码之后的

将数据分组在< / tr>之间。字符串:


data = [

1.5,< / tr>,2.5,3.5 "," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR>" ,

1.5,< / tr>,2.5,3.5,4.5,< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",

" 1.5"," ;< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5 "," 6.5","< / tr>",

]


import itertools


def key(s):

如果s [0] =="<":

返回''a''

else:

返回''b''


def test3():


master_list = []

for group_key,group in itertools.groupby(data,key):

if group_ke y ==" b":

master_list.append(list(group))

def test1():

master_list = []

row = []


for elmt in data:

if elmt [0]!="<" :

row.append(elmt)

else:

如果行:

master_list.append(" " .join(row))

row = []

导入时间

t = timeit.Timer(" test3 ()"," from __main__ import test3,key,data")

print t.timeit()

t = timeit.Timer(" test1()" ;,来自__main__ import test1,data)

print t.timeit()


- 输出:---

42.791079998

19.0128788948


我认为groupby()会更快。我做错了什么?

I''m applying groupby() in a very simplistic way to split up some data,
but when I timeit against another method, it takes twice as long. The
following groupby() code groups the data between the "</tr>" strings:

data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]

import itertools

def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''

def test3():

master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )
def test1():
master_list = []
row = []

for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []
import timeit

t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()

--output:---
42.791079998
19.0128788948

I thought groupby() would be faster. Am I doing something wrong?

推荐答案

10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo。 comwrote:
On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:

我正在以非常简单的方式应用groupby()来分割一些数据,

但是当我反对的时候另一种方法,需要两倍的时间。在groupby()代码之后的

将数据分组在< / tr>之间。字符串:


data = [

1.5,< / tr>,2.5,3.5 "," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR>" ,

1.5,< / tr>,2.5,3.5,4.5,< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",

" 1.5"," ;< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5 "," 6.5","< / tr>",

]


import itertools


def key(s):

如果s [0] =="<":

返回''a''

else:

返回''b''


def test3():


master_list = []

for group_key,group in itertools.groupby(data,k ey):

如果group_key ==" b":

master_list.append(list(group))


def test1():

master_list = []

row = []


for elmt in data:

if elmt [0]!="<":

row.append(elmt)

else:

if行:

master_list.append(" " .join(row))

row = []


导入时间


t = timeit.Timer (" test3()",from __main__ import test3,key,data)

print t.timeit()

t = timeit.Timer(" test1()"," from __main__ import test1,data")

print t.timeit()


- 输出:---

42.791079998

19.0128788948


我认为groupby()会更快。难道我做错了什么?
I''m applying groupby() in a very simplistic way to split up some data,
but when I timeit against another method, it takes twice as long. The
following groupby() code groups the data between the "</tr>" strings:

data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]

import itertools

def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''

def test3():

master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )

def test1():
master_list = []
row = []

for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []

import timeit

t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()

--output:---
42.791079998
19.0128788948

I thought groupby() would be faster. Am I doing something wrong?



是和否。是的,可以通过调用内置方法而不是Python函数来改进groupby版本。

。不,test1仍然

击败它(并与Psyco进一步);它几乎是好的

,因为它是纯粹的Python。


FWIW,这里是groupby的更快更紧凑的版本:


def test3b(数据):

join =''''。join

返回[join(group)for key,group in

itertools.groupby(data,"< / tr>" .__ eq__)

if key not

George

Yes and no. Yes, the groupby version can be improved a little by
calling a builtin method instead of a Python function. No, test1 still
beats it hands down (and with Psyco even further); it is almost good
as it gets in pure Python.

FWIW, here''s a faster and more compact version with groupby:

def test3b(data):
join = '' ''.join
return [join(group) for key,group in
itertools.groupby(data, "</tr>".__eq__)
if not key]
George


不应该这样
Shouldn''t this

>>打印re.sub('''',''\\ n'',''bab'')
>>print re.sub(''a'',''\\n'',''bab'')



b

b


输出


b \ nb


代替?


Massimo


2007年10月16日凌晨1点34分,George Sakkis写道:

b
b

output

b\nb

instead?

Massimo

On Oct 16, 2007, at 1:34 AM, George Sakkis wrote:


10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo.comwrote:
On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:

>我正在以非常简单的方式应用groupby()来分割一些
数据,但是当我对另一种方法进行计时时,需要两次

以下groupby()代码对< / tr>之间的数据进行分组。字符串:

data = [
1.5,< / tr>,2.5,3.5,4.5和 ,"< / tr>","< / tr>",""">"""< / tr>",
" 1.5 ","< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>" ,5.5,6.5,< / tr>,
1.5","< / tr>"," 2.5"," ; 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR> ,

导入itertools

def键:
如果s [0] =="<":
返回''a''
否则:
返回''b''

def test3():

master_list = [ ]
for group_key,group in itertools.groupby(data,key):
if group_key ==" b":
master_list.append (列表(组))

def test1():
master_list = []
row = []

用于elmt的数据:
如果elmt [0]!="<":
row.append(elmt)
否则:
如果行:
master_list.append(" " .join(row))
row = []

导入时间

t = timeit.Timer(" test3()","来自__main__ import test3,key,data")
print t.timeit()
t = timeit.Timer(" test1()"," from __main__ import test1,data")
print t.timeit()

- 输出:---
42.791079998
19.0128788948

我以为groupby()会更快。难道我做错了什么?
>I''m applying groupby() in a very simplistic way to split up some
data,
but when I timeit against another method, it takes twice as long.
The
following groupby() code groups the data between the "</tr>" strings:

data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]

import itertools

def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''

def test3():

master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )

def test1():
master_list = []
row = []

for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []

import timeit

t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()

--output:---
42.791079998
19.0128788948

I thought groupby() would be faster. Am I doing something wrong?



是和否。是的,可以通过调用内置方法而不是Python函数来改进groupby版本。

。不,test1仍然

击败它(并与Psyco进一步);它几乎是好的

,因为它是纯粹的Python。


FWIW,这里是groupby的更快更紧凑的版本:


def test3b(数据):

join =''''。join

返回[join(group)for key,group in

itertools.groupby(data,"< / tr>" .__ eq__)

if key not


George


-
http://mail.python.org/mailman/listinfo/python-list


更奇怪

>> re.sub(''''',''\\ n'',''bab'' )
>>re.sub(''a'', ''\\n'',''bab'')



''b\\\
b''

''b\nb''


>> print re.sub('''',''\\ n'',''bab'')
>>print re.sub(''a'', ''\\n'',''bab'')



b

b


Massimo

2007年10月16日,1:上午54点,DiPierro,Massimo写道:

b
b

Massimo
On Oct 16, 2007, at 1:54 AM, DiPierro, Massimo wrote:


不应该这样
Shouldn''t this

>>> print re.sub('''',''\\ n'',''bab'')
>>>print re.sub(''a'',''\\n'',''bab'')



b

b


输出


b \ nb


代替?


Massimo


2007年10月16日凌晨1点34分,George Sakkis写道:

b
b

output

b\nb

instead?

Massimo

On Oct 16, 2007, at 1:34 AM, George Sakkis wrote:


> 10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo.comwrote:
>On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:

>>我正在以非常简单的方式应用groupby()来分割一些数据,
但当我timeit ag时另一种方法,它需要两倍的时间。
以下groupby()代码将数据分组在< / tr>
字符串中:

data = [
" 1.5","< / tr>"," 2.5"," 3.5"," 4.5""< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",
" 1.5","< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5"," ; 6.5","< / tr>",
" 1.5","< / tr>"," 2.5"" 3.5"" 4.5 ","< / tr>","< / tr>","">"""""< / tr>",
]

导入itertools

def key(s):
如果s [0] =="<":
返回''a ''
否则:
返回''b''

def test3():

master_list = []
for group_key,group in itertools.groupby(data,key):
if group_key ==" b":
master_list.append(list(group))

> def test1():
master_list = []
对于elmt in data:
如果elmt [0]!="< :
row.append(elmt)
否则:
如果行:
master_list.append(" " .join(row))
row = []

导入时间

t = timeit.Timer(" test3()","来自__main__ import test3,key,data")
print t.timeit()
t = timeit.Timer(" test1()"," from __main__ import test1,data")
print t.timeit()

- 输出:---
42.791079998
19.0128788948

我以为groupby()会更快。难道我做错了什么?
>>I''m applying groupby() in a very simplistic way to split up some
data,
but when I timeit against another method, it takes twice as long.
The
following groupby() code groups the data between the "</tr>"
strings:

data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]

import itertools

def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''

def test3():

master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )

def test1():
master_list = []
row = []

for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []

import timeit

t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()

--output:---
42.791079998
19.0128788948

I thought groupby() would be faster. Am I doing something wrong?


是和否。是的,通过调用内置方法而不是Python函数,可以稍微改进groupby版本。不,测试1
仍然打败它(并与Psyco进一步合作);它几乎是好的,因为它是纯粹的Python。

FWIW,这是一个更快,更紧凑的版本与groupby:

def test3b(数据) :
join =''''。join
返回[join(group)for key,group in
itertools.groupby(data,"< / tr>" .__ eq__)
如果不是关键的]

乔治

-
http://mail.python.org/mailman/listinfo/python-list



-
http://mail.python.org/mailman/ listinfo / python-list


这篇关于groupby()似乎很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆