" GROUPBY"太棒了! [英] "groupby" is brilliant!

查看:61
本文介绍了" GROUPBY"太棒了!的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好吧


这对你们大多数人来说可能是老套的,但对我来说这是一个

的启示,所以我想我会分享它如果有人有类似的

要求。


我必须转换一个旧程序,传统的传递通过

排序数据文件,打破某些字段的更改,每行处理
,累计各种总计,并在每次休息时进行额外处理

。我没有使用这个数据库,因为文件

大小不大 - 最多几千行。我正在使用csv

文件,并使用csv模块,以便每行格式良好

到列表中。


传统方法非常繁琐,保存了各个中断字段的值,将每行的值与保存的

值进行比较,如果值不同则采取措施。它有更多的休息区

,它得到的小东西。


我将在python中做同样的事情,但后来我隐约记得

阅读''groupby''。花了一点时间搞清楚,但是一旦我破解了它就把它变成了b / b
简单。

这是一个例子。想象一下按分支排序的交易文件,

帐号和日期,你想要打破这三个。


------- ----------------------

导入csv

来自itertools import groupby

来自运营商导入项目符号


BRN = 0

ACC = 1

日期= 2

reader = csv.reader(open(''trans.csv'',''rb''))

rows = []

for row在读者中:

rows.append(行)


for acc,accList in groupby(brnList,itemgetter(ACC)):

for date,dateList in groupby(accList,itemgetter(DATE)):

for row在dateList:

[做行的事情]

[做更改日期的事情]

[做更改acc的事情]

[做一些改变brn的事情]

--------------------------- -


希望有人觉得有兴趣。


弗兰克米尔曼

解决方案

嗨弗兰克


这是一个为什么我喜欢Python,你可以编写可读的

代码。

我努力编写干净的代码,但我发现异常处理代码

例如尝试:

使我的代码变得丑陋且难以阅读。对于以前的C ++ / Perl编码器,有没有人有好的

指针。


/ vpr

弗兰克米尔曼写道:

大家好

这对你们大多数人来说可能是老套的,但对我来说这是一个启示,所以我想我如果有人有类似的要求,我会分享它。

我不得不转换一个传统的传递旧的程序,通过一个分类的数据文件,打破一个变化某些领域,处理每一行,累计各种总数,并在每次休息时进行额外处理。我没有使用这个数据库,因为文件大小不大 - 最多几千行。我正在使用csv
文件,并使用csv模块,以便将每行格式化得很好。

传统方法非常繁琐,节省了
各种中断字段,将每行的值与保存的值相比较,并在值不同时采取措施。它有更多的休息区域,它有更多的小东西。

我将在python中做同样的事情,但后来我依旧记得
阅读''groupby'' 。花了一点时间来搞清楚,但是一旦我破解了它,它就把任务变成了一个完全简单的任务。

这是一个例子。想象一下按分支,
帐号和日期排序的交易文件,你想打破所有三个。

---------------来自运营商导入项目的来自itertools import groupby的--------------
导入csv

BRN = 0 ACC = 1
DATE = 2

reader = csv.reader(open(''trans.csv'',''rb''))
rows = []
读取行:
rows.append(row)
for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for date in rowList:
[对行做一些事情] < [做更改日期的事情]
[做一些关于改变的事情]
[做一些改变brn的事情]
----------- ------------------

希望有人发现这件事。

Frank Millman



>

reader = csv.reader(open(''trans.csv'',''rb''))
rows = []
读取行:
rows.append(row)




这是未经测试的,但您可能会考虑转换您的显式for ......

追加循环到列表组合,


rows = [读取行中的行]


或只是一个普通的列表构造函数:


行=列表(读者)


Neh?


- Paul

(哦,我也喜欢groupby!将它与sort结合起来快速创建

直方图。)


#计算值列表的直方图从1-10

dataValueRange = range(1,11)

data = [random.choice(dataValueRange)for x in xrange(10000)]


hist = [(k,len(list(g)))对于k,g在itertools.groupby(已排序(数据))]

print hist


histAsDict = dict((k,len(list(g)))k,g in

itertools.groupby(sorted(data)))

打印histAsDict


给予:


[(1,979),(2,1034),(3,985) ,(4,969),(5,1020),(6,975),(7,981),(8,

1070),(9,1003),(10,984) ]

{1:979,2:1034,3:985,4:969,5:1020,6:975,7:981,8:1070,9:1003,

10:984}




Paul McGuire写道:


reader = csv.reader(open(''trans.csv'',''rb''))
rows = []
for reader中的行:
rows.append(row)



这是未经测试的,但您可能会考虑转换您的显式for ...
append循环到列表comp,

rows = [行中的行读取器]

或只是一个普通的列表构造函数:

rows = list (读者)

嗯?

- 保罗




是的,他们都工作正常。


有时你可能想要在追加

之前按摩数据,在这种情况下你显然必须做很长的事情。否则

这些肯定更整洁,特别是最后一个。


你甚至可以做一个单行 -

rows = list(csv.reader(open(''trans.csv'',''rb'')))


它看起来对我来说仍然完全可读。


谢谢


Frank


Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about ''groupby''. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open(''trans.csv'', ''rb''))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman

解决方案

Hi Frank

This is one of the reasons why I love Python, you can write readable
code.
I strive to write clean code but I find that exception handling code
e.g. try:
makes my code ugly and significantly harder to read. Does anyone have
any good
pointers for a former C++ / Perl coder.

/vpr
Frank Millman wrote:

Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about ''groupby''. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open(''trans.csv'', ''rb''))
rows = []
for row in reader:
rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
for acc,accList in groupby(brnList,itemgetter(ACC)):
for date,dateList in groupby(accList,itemgetter(DATE)):
for row in dateList:
[do something with row]
[do something on change of date]
[do something on change of acc]
[do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman




>

reader = csv.reader(open(''trans.csv'', ''rb''))
rows = []
for row in reader:
rows.append(row)



This is untested, but you might think about converting your explicit "for...
append" loop into either a list comp,

rows = [row for row in reader]

or just a plain list constructor:

rows = list(reader)

Neh?

-- Paul
(Oh, and I like groupby too! Combine it with sort to quickly create
histograms.)

# tally a histogram of a list of values from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10000)]

hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]
print hist

histAsDict = dict((k,len(list(g))) for k,g in
itertools.groupby(sorted(data)))
print histAsDict

Gives:

[(1, 979), (2, 1034), (3, 985), (4, 969), (5, 1020), (6, 975), (7, 981), (8,
1070), (9, 1003), (10, 984)]
{1: 979, 2: 1034, 3: 985, 4: 969, 5: 1020, 6: 975, 7: 981, 8: 1070, 9: 1003,
10: 984}



Paul McGuire wrote:


reader = csv.reader(open(''trans.csv'', ''rb''))
rows = []
for row in reader:
rows.append(row)



This is untested, but you might think about converting your explicit "for...
append" loop into either a list comp,

rows = [row for row in reader]

or just a plain list constructor:

rows = list(reader)

Neh?

-- Paul



Yup, they both work fine.

There may be times when you want to massage the data before appending
it, in which case you obviously have to do it the long way. Otherwise
these are definitely neater, the last one especially.

You could even do it as a one-liner -
rows = list(csv.reader(open(''trans.csv'', ''rb'')))

It still looks perfectly readable to me.

Thanks

Frank


这篇关于&QUOT; GROUPBY&QUOT;太棒了!的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆