将Pandas DataFrame列映射到字典 [英] Mapping pandas dataframe column to a dictionary

查看：251 发布时间：2020/5/24 3:31:22 python python-3.x pandas series categorical-data

本文介绍了将Pandas DataFrame列映射到字典的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框包含高基数(许多唯一值)的分类变量的情况.我想将该变量重新编码为一组值(最常见的值)，然后将所有其他值替换为全部类别(其他").举一个简单的例子:

I have a case of a dataframe containing a categorical variable of high cardinality (many unique values). I would like to re-code that variable to a set of values (the top most frequent values) and replace all other values with a catch-all category ("others"). To give a simple example:

以下是两个应保持不变的值:

Here are the two values which should stay unchanged:

top_values = ['apple', 'orange']

我根据以下数据帧列中的频率来建立它们:

I established them based on their frequency in the following dataframe column:

{'fruits': {0: 'apple',
1: 'apple',
2: 'orange',
3: 'orange',
4: 'banana',
5: 'grape'}}

该数据框列应按以下方式重新编码:

That dataframe column should be re-coded as follows:

{'fruits': {0: 'apple',
1: 'apple',
2: 'orange',
3: 'orange',
4: 'other',
5: 'other'}}

该怎么做? (数据框具有数百万条记录)

How to do that? (The dataframe has millions of records)

`loc` +布尔索引

df.loc[~df['fruits'].isin(top_values), 'fruits'] = 'other'

此过程之后，您可能需要将您的系列分类:

After this process, you will probably want to turn your series into a categorical:

df['fruits'] = df['fruits'].astype('category')

在输入值具有高基数的情况下，执行值替换操作可能无济于事.

Doing this before the value replacement operation probably won't help as your input series has high cardinality.

这篇关于将Pandas DataFrame列映射到字典的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将Pandas DataFrame列映射到字典 [英] Mapping pandas dataframe column to a dictionary

问题描述

推荐答案

`loc` +布尔索引

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将Pandas DataFrame列映射到字典 [英] Mapping pandas dataframe column to a dictionary

问题描述

推荐答案

loc +布尔索引

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

`loc` +布尔索引

登录关闭