如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe [英] map column values to 'miscellaneous' if value counts is below a threshold - Categorical Column - Pandas Dataframe

查看：129 发布时间：2020/5/23 23:34:26 python pandas

本文介绍了如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个形状为[200K，40]的熊猫数据框.数据框具有一个分类列(多个列中的一个)，具有超过1000个唯一值.我可以使用以下方法可视化每个此类唯一列的值计数:

I have a pandas dataframe of shape ~ [200K, 40]. The dataframe has a categorical column (one of many) with over 1000 unique values. I can visualizee the value counts of each such unique column by using:

df['column_name'].value_counts()

我现在如何通过以下方式来体现价值观:

How do i now club values with:

value_count是否小于阈值(例如100)，并将其映射为其他"?
根据累积行数％进行或"运算吗?

推荐答案

您可以从value_counts的索引中提取要屏蔽的值，然后使用

You can extract the values you want to mask from the index of value_counts and them map them to "miscellaneous" using replace:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 10, (2000, 2)), columns=['A', 'B'])

frequencies = df['A'].value_counts()

condition = frequencies<200   # you can define it however you want
mask_obs = frequencies[condition].index
mask_dict = dict.fromkeys(mask_obs, 'miscellaneous')

df['A'] = df['A'].replace(mask_dict)  # or you could make a copy not to modify original data

现在，使用value_counts会将所有低于阈值的值归类为杂项:

Now, using value_counts will group all the values below your threshold as miscellaneous:

df['A'].value_counts()

df['A'].value_counts()
Out[18]: 
miscellaneous    947
3                226
1                221
0                204
7                201
2                201

这篇关于如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe [英] map column values to 'miscellaneous' if value counts is below a threshold - Categorical Column - Pandas Dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe [英] map column values to &#39;miscellaneous&#39; if value counts is below a threshold - Categorical Column - Pandas Dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如果值计数低于阈值，则将列值映射为“其他"-分类列-Pandas Dataframe [英] map column values to 'miscellaneous' if value counts is below a threshold - Categorical Column - Pandas Dataframe

登录关闭