Python-按多列分组并获取最大或总和 [英] Python - Grouping by multiple columns and getting max or sum

查看:571
本文介绍了Python-按多列分组并获取最大或总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣获得产品价格的最大值.

I am interested in getting max value of the Product price.

以下是输入数据.我想按州,国家分组

如何对这两列进行分组以获取Price的最大值.

How do I go about grouping these two columns get the max value of Price.

import csv
import locale
from itertools import groupby

locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' ) 

total_price = 0
max_price = 0
reader = csv.DictReader(open('/Users/myuser/Downloads/SalesData.csv', 'rU'), dialect='excel')

groups = groupby(reader, lambda d: d['State'])

result = [max(g, key=lambda d: d['State']) for k, g in groups]

for row in reader:
    print row["State"], row["Country"], locale.atoi(row["Price"])
    max_price = max(row.iteritems(), key=operator.itemgetter(1))
    total_price += locale.atoi(row["Price"])    

使用熊猫的.我可以不用熊猫吗?

With use of panda's. Can I get this without using pandas.

import pandas as pd
from pandas import DataFrame
import locale

locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' ) 


df = pd.read_csv('/Users/myuser/Downloads/SalesData.csv', index_col=False, header=0,thousands=',')

print df.groupby(["Country","State"]).max()["Price"]

推荐答案

itertools.groupby仅适用于使用与groupby相同的键函数进行排序的列表,如

itertools.groupby only works on lists sorted using the same key function used for groupby , as given in the documentation -

itertools.groupby(iterable [,键])

制作一个迭代器,从迭代器返回连续的键和组.键是为每个元素计算键值的函数.如果未指定或为None,则键默认为标识函数,并返回不变的元素. 通常,可迭代项必须已经在相同的键功能上进行了排序.

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

因此,要使用itertools.groupby实现所需的功能,您很可能首先需要根据'Country''State'对数据进行排序,然后对它们进行分组.

So for achieving what you want with itertools.groupby , you would most probably need to sort the data first based on both 'Country' and 'State' , and then take groupby on it.

此外,服用max()时,您应该使用'Price'而不是'State'.示例-

also, when taking max() you should use 'Price' not 'State' . Example -

reader = csv.DictReader(open('/Users/myuser/Downloads/SalesData.csv', 'rU'), dialect='excel')

sortedreader = sorted(reader, key=lambda d: (d['Country'], d['State']))

groups = groupby(sortedreader, key=lambda d: (d['Country'], d['State']))

result = [(k, max(g, key=lambda d: d['Price'])) for k, g in groups]

我将键添加到结果中,以识别每个最大值对应的Country/State.此后,您可以遍历result并打印出真正想要的内容.

I added the key to the result , to identify which Country/State each max corresponds to. After this you can iterate over result and print each if that is what you really want.

这篇关于Python-按多列分组并获取最大或总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆