根据其他文本列将数字列添加到pandas数据框 [英] Add numeric column to pandas dataframe based on other textual column

查看:70
本文介绍了根据其他文本列将数字列添加到pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框:

df = pd.DataFrame([['137', 'earn'], ['158', 'earn'],['144', 'ship'],['111', 'trade'],['132', 'trade']], columns=['value', 'topic'] )
print(df)
    value  topic
0   137   earn
1   158   earn
2   144   ship
3   111  trade
4   132  trade

我想要一个这样的附加数字列:

And I want an additional numeric column like this:

    value  topic  topic_id
0   137   earn    0
1   158   earn    0
2   144   ship    1
3   111  trade    2
4   132  trade    2

因此,基本上我想生成一列,该列将字符串列编码为数字值.我实现了此解决方案:

So basically I want to generate a column which encodes a string column to a numeric value. I implemented this solution:

topics_dict = {}
topics = np.unique(df['topic']).tolist()
for i in range(len(topics)):
        topics_dict[topics[i]] = i
df['topic_id'] = [topics_dict[l] for l in df['topic']]

但是,我很确定有解决此问题的更优雅,更灵巧的方法,但是我无法在Google或SO上找到任何东西. 我读到了有关熊猫的 get_dummies ,但这会创建多个原始列中每个不同值的列.

However, I am quite sure there is a more elegant and pandaic way to solve this but I couln't find something on Google or SO. I read about pandas' get_dummies but this creates multiple columns for each different value in the original column.

感谢您的帮助或指导!

推荐答案

您可以使用

In [63]: df['topic'].astype('category').cat.codes
Out[63]:
0    0
1    0
2    1
3    2
4    2
dtype: int8

这篇关于根据其他文本列将数字列添加到pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆