pandas 的简单交叉制表 [英] Simple cross-tabulation in pandas

查看：96 发布时间：2020/5/24 1:45:35 python pandas dataframe pivot-table

本文介绍了 pandas 的简单交叉制表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我偶然发现了 pandas ，它非常适合我想做的简单计算.我有SAS背景，并且正在考虑将其替换为proc freq －看起来它可以扩展到我将来可能要做的事情.但是，我似乎似乎无法解决一个简单的任务(我不确定是否应该看pivot/crosstab/indexing-是否应该使用Panel或DataFrames等... ).有人可以给我一些有关如何执行以下操作的提示:

I stumbled across pandas and it looks ideal for simple calculations that I'd like to do. I have a SAS background and was thinking it'd replace proc freq -- it looks like it'll scale to what I may want to do in the future. However, I just can't seem to get my head around a simple task (I'm not sure if I'm supposed to look at pivot/crosstab/indexing - whether I should have a Panel or DataFrames etc...). Could someone give me some pointers on how to do the following:

我有两个CSV文件(一个用于2010年，一个用于2011年-简单的交易数据)-列是类别和金额

I have two CSV files (one for year 2010, one for year 2011 - simple transactional data) - The columns are category and amount

2010:

AB,100.00
AB,200.00
AC,150.00
AD,500.00

2011:

AB,500.00
AC,250.00
AX,900.00

这些被加载到单独的DataFrame对象中.

These are loaded into separate DataFrame objects.

我想做的就是获取类别，类别的总和以及类别的频率，例如:

What I'd like to do is get the category, the sum of the category, and the frequency of the category, eg:

2010:

AB,300.00,2
AC,150.00,1
AD,500.00,1

2011:

AB,500.00,1
AC,250.00,1
AX,900.00,1

我无法确定我是否应该使用pivot/crosstab/groupby/an index 等等...我既可以求和，也可以得到频率-我似乎无法两者兼得...它变得更加复杂，因为我想每月进行一次，但是我想如果有人会很高兴为我指出正确的技术/方向，使我能够从那里走下去.

I can't work out whether I should be using pivot/crosstab/groupby/an index etc... I can get either the sum or the frequency - I can't seem to get both... It gets a bit more complex because I would like to do it on a month by month basis, but I think if someone would be so kind to point me to the right technique/direction I'll be able to go from there.

推荐答案

假设您有一个名为2010.csv的文件，其中包含内容

Assuming that you have a file called 2010.csv with contents

category,value
AB,100.00
AB,200.00
AC,150.00
AD,500.00

然后，使用以下功能:应用在groupby之后有多个聚合功能，您可以说:

Then, using the ability to apply multiple aggregation functions following a groupby, you can say:

import pandas
data_2010 = pandas.read_csv("/path/to/2010.csv")
data_2010.groupby("category").agg([len, sum])

您应该得到一个类似

          value     
            len  sum
category            
AB            2  300
AC            1  150
AD            1  500

请注意，Wes可能会指出总和已优化，您可能应该使用np.sum.

Note that Wes will likely come by to point out that sum is optimized and that you should probably use np.sum.

这篇关于 pandas 的简单交叉制表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 的简单交叉制表 [英] Simple cross-tabulation in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 的简单交叉制表 [英] Simple cross-tabulation in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭