如何过滤由特定列在pandas中创建的交叉表 [英] How to filter a crosstab created in pandas by a specific column

查看:205
本文介绍了如何过滤由特定列在pandas中创建的交叉表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫中创建了一个交叉列表:

  grouped_missing_analysis = pd.crosstab(clean_sessions.action_type,clean_sessions.action ,margins = True).unstack()
print(grouped_missing_analysis [:20])

这导致显示:

 操作操作类型
10缺失0
未知0
booking_request 0
booking_response 0
点击0
数据0
message_post 3215
修改0
partner_callback 0
提交0
查看0
全部3215
11遗失0
未知0
booking_request 0
booking_response 0
点击0
data 0
message_post 716
修改0
dtype:int64

我只想显示'Unknown','Missing'或'Other'的 action_type ,并忽略其他 action_type 为每个动作。我有一个感觉,答案是:

  .where(clean_sessions.action_type.isin(('Missing','未知')),'其他')

从我以前的代码片断,但我不能让它工作。也许 pivot_table 更简单,这个练习仅供我学习如何在python中使用不同的函数进行数据分析。



clean_sessions 的原始数据如下所示:

  user_id操作action_type action_detail \ 
0 d1mm9tcy42查找缺少缺失
1 d1mm9tcy42 search_results点击view_search_results
2 d1mm9tcy42查找缺少缺失
3 d1mm9tcy42 search_results点击view_search_results
4 d1mm9tcy42查找缺少缺失
5 d1mm9tcy42 search_results click view_search_results
6 d1mm9tcy42查找缺少缺失的
7 d1mm9tcy42个性化数据wishlist_content_update
8 d1mm9tcy42索引视图view_search_results
9 d1mm9tcy42查找丢失缺失

device_type secs_elapsed
0 Windows桌面319
1 Windows桌面67753
2 Windows桌面301
3 Windows桌面22141
4 Windows桌面435
5 Windows桌面7703
6 Windows桌面115
7 Windows桌面831
8 Windows桌面20842
9 Windows桌面683


解决方案

这些是您的索引而不是列,您需要通过标签来选择您可以为第一级传递切片(无),然后为第二级传递一个列表在[102]中:
grouped_missing_analysis.loc [slice(None),['Missing','')未知','其他']]

出[102]:
动作动作类型
索引缺少0
lookup Missing 5
personalize Missing 0
search_results Missing 0
All Missing 5
dtype:int64

docs 给出了这种索引风格的更多细节

I have created a cross tabulation in pandas using:

grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()    
print(grouped_missing_analysis[:20])

Which leads to displaying:

action  action_type     
10      Missing                0
        Unknown                0
        booking_request        0
        booking_response       0
        click                  0
        data                   0
        message_post        3215
        modify                 0
        partner_callback       0
        submit                 0
        view                   0
        All                 3215
11      Missing                0
        Unknown                0
        booking_request        0
        booking_response       0
        click                  0
        data                   0
        message_post         716
        modify                 0
dtype: int64

I want to only show the action_type which is either 'Unknown', 'Missing' or 'Other', and ignore other action_type for each action. I have a feeling the answer is to do with:

.where(clean_sessions.action_type.isin(('Missing', 'Unknown')), 'Other')

From a previous snippet I have, but I can't get it to work. Maybe pivot_table is easier, this exercise is just for me to learn about how to do data analysis in python with the different functions.

Raw data for clean_sessions looks like:

   user_id          action action_type            action_detail  \
0  d1mm9tcy42          lookup     Missing                  Missing   
1  d1mm9tcy42  search_results       click      view_search_results   
2  d1mm9tcy42          lookup     Missing                  Missing   
3  d1mm9tcy42  search_results       click      view_search_results   
4  d1mm9tcy42          lookup     Missing                  Missing   
5  d1mm9tcy42  search_results       click      view_search_results   
6  d1mm9tcy42          lookup     Missing                  Missing   
7  d1mm9tcy42     personalize        data  wishlist_content_update   
8  d1mm9tcy42           index        view      view_search_results   
9  d1mm9tcy42          lookup     Missing                  Missing   

       device_type secs_elapsed  
0  Windows Desktop          319  
1  Windows Desktop        67753  
2  Windows Desktop          301  
3  Windows Desktop        22141  
4  Windows Desktop          435  
5  Windows Desktop         7703  
6  Windows Desktop          115  
7  Windows Desktop          831  
8  Windows Desktop        20842  
9  Windows Desktop          683 

解决方案

Those are your indices and not columns, you need to pass labels to select the rows of interest.

You can pass slice(None) for the first level and then a list for the second level:

In [102]:
grouped_missing_analysis.loc[slice(None), ['Missing', 'Unknown', 'Other']]

Out[102]:
action          action_type
index           Missing        0
lookup          Missing        5
personalize     Missing        0
search_results  Missing        0
All             Missing        5
dtype: int64

The docs give more detail on this style of indexing

这篇关于如何过滤由特定列在pandas中创建的交叉表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆