如何过滤由特定列在pandas中创建的交叉表 [英] How to filter a crosstab created in pandas by a specific column
问题描述
我在熊猫中创建了一个交叉列表:
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type,clean_sessions.action ,margins = True).unstack()
print(grouped_missing_analysis [:20])
这导致显示:
操作操作类型
10缺失0
未知0
booking_request 0
booking_response 0
点击0
数据0
message_post 3215
修改0
partner_callback 0
提交0
查看0
全部3215
11遗失0
未知0
booking_request 0
booking_response 0
点击0
data 0
message_post 716
修改0
dtype:int64
我只想显示'Unknown','Missing'或'Other'的 action_type
,并忽略其他 action_type
为每个动作。我有一个感觉,答案是:
.where(clean_sessions.action_type.isin(('Missing','未知')),'其他')
从我以前的代码片断,但我不能让它工作。也许 pivot_table
更简单,这个练习仅供我学习如何在python中使用不同的函数进行数据分析。
clean_sessions
的原始数据如下所示:
user_id操作action_type action_detail \
0 d1mm9tcy42查找缺少缺失
1 d1mm9tcy42 search_results点击view_search_results
2 d1mm9tcy42查找缺少缺失
3 d1mm9tcy42 search_results点击view_search_results
4 d1mm9tcy42查找缺少缺失
5 d1mm9tcy42 search_results click view_search_results
6 d1mm9tcy42查找缺少缺失的
7 d1mm9tcy42个性化数据wishlist_content_update
8 d1mm9tcy42索引视图view_search_results
9 d1mm9tcy42查找丢失缺失
device_type secs_elapsed
0 Windows桌面319
1 Windows桌面67753
2 Windows桌面301
3 Windows桌面22141
4 Windows桌面435
5 Windows桌面7703
6 Windows桌面115
7 Windows桌面831
8 Windows桌面20842
9 Windows桌面683
这些是您的索引而不是列,您需要通过标签来选择您可以为第一级传递切片(无)
,然后为第二级传递一个列表在[102]中:
grouped_missing_analysis.loc [slice(None),['Missing','')未知','其他']]
出[102]:
动作动作类型
索引缺少0
lookup Missing 5
personalize Missing 0
search_results Missing 0
All Missing 5
dtype:int64
docs 给出了这种索引风格的更多细节
I have created a cross tabulation in pandas using:
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
print(grouped_missing_analysis[:20])
Which leads to displaying:
action action_type
10 Missing 0
Unknown 0
booking_request 0
booking_response 0
click 0
data 0
message_post 3215
modify 0
partner_callback 0
submit 0
view 0
All 3215
11 Missing 0
Unknown 0
booking_request 0
booking_response 0
click 0
data 0
message_post 716
modify 0
dtype: int64
I want to only show the action_type
which is either 'Unknown', 'Missing' or 'Other', and ignore other action_type
for each action. I have a feeling the answer is to do with:
.where(clean_sessions.action_type.isin(('Missing', 'Unknown')), 'Other')
From a previous snippet I have, but I can't get it to work. Maybe pivot_table
is easier, this exercise is just for me to learn about how to do data analysis in python with the different functions.
Raw data for clean_sessions
looks like:
user_id action action_type action_detail \
0 d1mm9tcy42 lookup Missing Missing
1 d1mm9tcy42 search_results click view_search_results
2 d1mm9tcy42 lookup Missing Missing
3 d1mm9tcy42 search_results click view_search_results
4 d1mm9tcy42 lookup Missing Missing
5 d1mm9tcy42 search_results click view_search_results
6 d1mm9tcy42 lookup Missing Missing
7 d1mm9tcy42 personalize data wishlist_content_update
8 d1mm9tcy42 index view view_search_results
9 d1mm9tcy42 lookup Missing Missing
device_type secs_elapsed
0 Windows Desktop 319
1 Windows Desktop 67753
2 Windows Desktop 301
3 Windows Desktop 22141
4 Windows Desktop 435
5 Windows Desktop 7703
6 Windows Desktop 115
7 Windows Desktop 831
8 Windows Desktop 20842
9 Windows Desktop 683
Those are your indices and not columns, you need to pass labels to select the rows of interest.
You can pass slice(None)
for the first level and then a list for the second level:
In [102]:
grouped_missing_analysis.loc[slice(None), ['Missing', 'Unknown', 'Other']]
Out[102]:
action action_type
index Missing 0
lookup Missing 5
personalize Missing 0
search_results Missing 0
All Missing 5
dtype: int64
The docs give more detail on this style of indexing
这篇关于如何过滤由特定列在pandas中创建的交叉表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!