groupby()似乎很慢 [英] groupby() seems slow
问题描述
我正在以非常简单的方式应用groupby()来分割一些数据,
但是当我对另一种方法进行计时时,需要两倍的时间。在groupby()代码之后的
将数据分组在< / tr>之间。字符串:
data = [
1.5,< / tr>,2.5,3.5 "," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR>" ,
1.5,< / tr>,2.5,3.5,4.5,< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",
" 1.5"," ;< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5 "," 6.5","< / tr>",
]
import itertools
def key(s):
如果s [0] =="<":
返回''a''
else:
返回''b''
def test3():
master_list = []
for group_key,group in itertools.groupby(data,key):
if group_ke y ==" b":
master_list.append(list(group))
def test1():
master_list = []
row = []
for elmt in data:
if elmt [0]!="<" :
row.append(elmt)
else:
如果行:
master_list.append(" " .join(row))
row = []
导入时间
t = timeit.Timer(" test3 ()"," from __main__ import test3,key,data")
print t.timeit()
t = timeit.Timer(" test1()" ;,来自__main__ import test1,data)
print t.timeit()
- 输出:---
42.791079998
19.0128788948
我认为groupby()会更快。我做错了什么?
I''m applying groupby() in a very simplistic way to split up some data,
but when I timeit against another method, it takes twice as long. The
following groupby() code groups the data between the "</tr>" strings:
data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]
import itertools
def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''
def test3():
master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )
def test1():
master_list = []
row = []
for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []
import timeit
t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()
--output:---
42.791079998
19.0128788948
I thought groupby() would be faster. Am I doing something wrong?
推荐答案
10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo。 comwrote:
On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:
我正在以非常简单的方式应用groupby()来分割一些数据,
但是当我反对的时候另一种方法,需要两倍的时间。在groupby()代码之后的
将数据分组在< / tr>之间。字符串:
data = [
1.5,< / tr>,2.5,3.5 "," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR>" ,
1.5,< / tr>,2.5,3.5,4.5,< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",
" 1.5"," ;< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5 "," 6.5","< / tr>",
]
import itertools
def key(s):
如果s [0] =="<":
返回''a''
else:
返回''b''
def test3():
master_list = []
for group_key,group in itertools.groupby(data,k ey):
如果group_key ==" b":
master_list.append(list(group))
def test1():
master_list = []
row = []
for elmt in data:
if elmt [0]!="<":
row.append(elmt)
else:
if行:
master_list.append(" " .join(row))
row = []
导入时间
t = timeit.Timer (" test3()",from __main__ import test3,key,data)
print t.timeit()
t = timeit.Timer(" test1()"," from __main__ import test1,data")
print t.timeit()
- 输出:---
42.791079998
19.0128788948
我认为groupby()会更快。难道我做错了什么?
I''m applying groupby() in a very simplistic way to split up some data,
but when I timeit against another method, it takes twice as long. The
following groupby() code groups the data between the "</tr>" strings:
data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]
import itertools
def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''
def test3():
master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )
def test1():
master_list = []
row = []
for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []
import timeit
t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()
--output:---
42.791079998
19.0128788948
I thought groupby() would be faster. Am I doing something wrong?
是和否。是的,可以通过调用内置方法而不是Python函数来改进groupby版本。
。不,test1仍然
击败它(并与Psyco进一步);它几乎是好的
,因为它是纯粹的Python。
FWIW,这里是groupby的更快更紧凑的版本:
def test3b(数据):
join =''''。join
返回[join(group)for key,group in
itertools.groupby(data,"< / tr>" .__ eq__)
if key not
George
Yes and no. Yes, the groupby version can be improved a little by
calling a builtin method instead of a Python function. No, test1 still
beats it hands down (and with Psyco even further); it is almost good
as it gets in pure Python.
FWIW, here''s a faster and more compact version with groupby:
def test3b(data):
join = '' ''.join
return [join(group) for key,group in
itertools.groupby(data, "</tr>".__eq__)
if not key]
George
不应该这样
Shouldn''t this
>>打印re.sub('''',''\\ n'',''bab'')
>>print re.sub(''a'',''\\n'',''bab'')
b
b
输出
b \ nb
代替?
Massimo
2007年10月16日凌晨1点34分,George Sakkis写道:
b
b
output
b\nb
instead?
Massimo
On Oct 16, 2007, at 1:34 AM, George Sakkis wrote:
10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo.comwrote:
On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:
>我正在以非常简单的方式应用groupby()来分割一些
数据,但是当我对另一种方法进行计时时,需要两次
以下groupby()代码对< / tr>之间的数据进行分组。字符串:
data = [
1.5,< / tr>,2.5,3.5,4.5和 ,"< / tr>","< / tr>",""">"""< / tr>",
" 1.5 ","< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>" ,5.5,6.5,< / tr>,
1.5","< / tr>"," 2.5"," ; 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5"," 6.5","< / TR> ,
导入itertools
def键:
如果s [0] =="<":
返回''a''
否则:
返回''b''
def test3():
master_list = [ ]
for group_key,group in itertools.groupby(data,key):
if group_key ==" b":
master_list.append (列表(组))
def test1():
master_list = []
row = []
用于elmt的数据:
如果elmt [0]!="<":
row.append(elmt)
否则:
如果行:
master_list.append(" " .join(row))
row = []
导入时间
t = timeit.Timer(" test3()","来自__main__ import test3,key,data")
print t.timeit()
t = timeit.Timer(" test1()"," from __main__ import test1,data")
print t.timeit()
- 输出:---
42.791079998
19.0128788948
我以为groupby()会更快。难道我做错了什么?
>I''m applying groupby() in a very simplistic way to split up some
data,
but when I timeit against another method, it takes twice as long.
The
following groupby() code groups the data between the "</tr>" strings:
data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]
import itertools
def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''
def test3():
master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )
def test1():
master_list = []
row = []
for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []
import timeit
t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()
--output:---
42.791079998
19.0128788948
I thought groupby() would be faster. Am I doing something wrong?
是和否。是的,可以通过调用内置方法而不是Python函数来改进groupby版本。
。不,test1仍然
击败它(并与Psyco进一步);它几乎是好的
,因为它是纯粹的Python。
FWIW,这里是groupby的更快更紧凑的版本:
def test3b(数据):
join =''''。join
返回[join(group)for key,group in
itertools.groupby(data,"< / tr>" .__ eq__)
if key not
George
-
http://mail.python.org/mailman/listinfo/python-list
更奇怪
>> re.sub(''''',''\\ n'',''bab'' )
>>re.sub(''a'', ''\\n'',''bab'')
''b\\\
b''
''b\nb''
>> print re.sub('''',''\\ n'',''bab'')
>>print re.sub(''a'', ''\\n'',''bab'')
b
b
Massimo
2007年10月16日,1:上午54点,DiPierro,Massimo写道:
b
b
Massimo
On Oct 16, 2007, at 1:54 AM, DiPierro, Massimo wrote:
不应该这样
Shouldn''t this
>>> print re.sub('''',''\\ n'',''bab'')
>>>print re.sub(''a'',''\\n'',''bab'')
b
b
输出
b \ nb
代替?
Massimo
2007年10月16日凌晨1点34分,George Sakkis写道:
b
b
output
b\nb
instead?
Massimo
On Oct 16, 2007, at 1:34 AM, George Sakkis wrote:
> 10月15日晚上11点02分,7stud< bbxx789_0 ... @ yahoo.comwrote:
>On Oct 15, 11:02 pm, 7stud <bbxx789_0...@yahoo.comwrote:
>>我正在以非常简单的方式应用groupby()来分割一些数据,
但当我timeit ag时另一种方法,它需要两倍的时间。
以下groupby()代码将数据分组在< / tr>
字符串中:
data = [
" 1.5","< / tr>"," 2.5"," 3.5"," 4.5""< / tr> ","< / tr>"," 5.5"," 6.5","< / tr>",
" 1.5","< / TR>"," 2.5英寸," 3.5英寸," 4.5","< / TR>","< / TR>"," 5.5"," ; 6.5","< / tr>",
" 1.5","< / tr>"," 2.5"" 3.5"" 4.5 ","< / tr>","< / tr>","">"""""< / tr>",
]
导入itertools
def key(s):
如果s [0] =="<":
返回''a ''
否则:
返回''b''
def test3():
master_list = []
for group_key,group in itertools.groupby(data,key):
if group_key ==" b":
master_list.append(list(group))
> def test1():
master_list = []
对于elmt in data:
如果elmt [0]!="< :
row.append(elmt)
否则:
如果行:
master_list.append(" " .join(row))
row = []
导入时间
t = timeit.Timer(" test3()","来自__main__ import test3,key,data")
print t.timeit()
t = timeit.Timer(" test1()"," from __main__ import test1,data")
print t.timeit()
- 输出:---
42.791079998
19.0128788948
我以为groupby()会更快。难道我做错了什么?
>>I''m applying groupby() in a very simplistic way to split up some
data,
but when I timeit against another method, it takes twice as long.
The
following groupby() code groups the data between the "</tr>"
strings:
data = [
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
"1.5","</tr>","2.5","3.5","4.5","</tr>","</tr>","5.5","6.5","</tr>",
]
import itertools
def key(s):
if s[0] == "<":
return ''a''
else:
return ''b''
def test3():
master_list = []
for group_key, group in itertools.groupby(data, key):
if group_key == "b":
master_list.append(list(group) )
def test1():
master_list = []
row = []
for elmt in data:
if elmt[0] != "<":
row.append(elmt)
else:
if row:
master_list.append(" ".join(row) )
row = []
import timeit
t = timeit.Timer("test3()", "from __main__ import test3, key, data")
print t.timeit()
t = timeit.Timer("test1()", "from __main__ import test1, data")
print t.timeit()
--output:---
42.791079998
19.0128788948
I thought groupby() would be faster. Am I doing something wrong?
是和否。是的,通过调用内置方法而不是Python函数,可以稍微改进groupby版本。不,测试1
仍然打败它(并与Psyco进一步合作);它几乎是好的,因为它是纯粹的Python。
FWIW,这是一个更快,更紧凑的版本与groupby:
def test3b(数据) :
join =''''。join
返回[join(group)for key,group in
itertools.groupby(data,"< / tr>" .__ eq__)
如果不是关键的]
乔治
-
http://mail.python.org/mailman/listinfo/python-list
-
http://mail.python.org/mailman/ listinfo / python-list
这篇关于groupby()似乎很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!