基于“行”的过滤在python大 pandas 中创建数据透视表后的数据 [英] Filtering based on the "rows" data after creating a pivot table in python pandas

查看:108
本文介绍了基于“行”的过滤在python大 pandas 中创建数据透视表后的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据,我从一个SQL数据库中获取并读入一个熊猫数据框。结果df是大约250M行,每天都在增长。因此,我想转动桌子给我一个更小的桌子(几千行)。

I have a set of data that I'm getting from a SQL database and reading into a pandas dataframe. The resulting df is about 250M rows and growing everyday. Therefore, I'd like to pivot the table to give me a much much smaller table to work with (few thousand rows).

该表看起来像这样,但更大:

The table looks something like this but much bigger:

data

  report_date             item_id        views   category
0  2013-06-01                   2            3          a
1  2013-06-01                   2            2          b
2  2013-06-01                   5           16          a 
3  2013-06-01                   2            4          c
4  2013-06-01                   2            5          d

我想通过忽略类别列,只是通过日期和item_id获取视图的总和来缩小。

I'd like to make this much smaller by ignoring the "category" column and just getting a total for views by date and item_id.

我'这样做:

pivot = data.pivot_table(values=['views'], rows=['report_date','item_id'], aggfunc='sum')

                                 views  
report_date item_id
2013-06-01        2                 14           
2013-06-01        5                 16

现在可以想象这个数据范围会更长,数月和数千个item_id。我想在2013-06-01和2013-06-10之间选择item_id = 2和report_date的总视图,或者选择这些行。

Now imagine this is much bigger with the data range going for months and thousands of item_id's. I'd like to select the total views for item_id = 2 and report_date between '2013-06-01' and '2013-06-10' or something along those lines.

我直接搜索了几个小时,但我看不到如何选择和/或过滤掉我的行(即report_date和item_id)部分的值。我只能在值部分(例如:视图)中过滤/选择数据。这个问题是相似的,最后,asker评论了同样的问题,但我从来没有回答过。我只想尝试并提请注意。

I've searched for several hours straight but I can't see how to select and/or filter off of values in my "rows" (i.e. report_date and item_id) section. I can only filter/select data in the "values" section (ex: views). This question is similar, and at the very end the asker commented the same question I'm asking but was never answered. I just wanted to try and draw attention to it.

使用python pandas进行数据透视表过滤和选择

我赞赏所有的帮助。这个网站和社区是绝对宝贵的。

I appreciated all the help. This site and the community have been absolutely invaluable.

推荐答案

你应该可以这样剪切:

In [11]: pivot.ix[('2013-06-01', 3):('2013-06-01', 6)]
Out[11]:
                     views
report_date item_id
2013-06-01  5           16

请参阅提前索引文档

这篇关于基于“行”的过滤在python大 pandas 中创建数据透视表后的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆