用1替换列中的任何字符串 [英] Replace any string in columns with 1

查看:45
本文介绍了用1替换列中的任何字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在和熊猫一起工作.我的目标是将数据帧中包含NaN或字符串数​​据的几列转换为或多或少的虚拟变量(NaN为0;任何字符串为1).我想不使用完整的字符串列表并替换每个字符串来执行此操作,因为存在错别字,这会导致错误.我已经可以使用fillna函数将所有NaN数据替换为0,就像梦一样!

I'm working with pandas. My goal is to convert several columns within a dataframe from containing either NaN or string data, into more or less a dummy variable (0's for NaN; 1's for any string). I'd like to do this without using a complete list of strings and replacing them each, because there are typos and this would lead to errors. I've been able to replace all the NaN data with 0's using the fillna function, which works like a dream!

我希望可以将所有字符串数据替换为1,而将0保留在原处.我已经搜索了stackoverflow和其他地方,但收效甚微.

I am hoping for something similar that will replace all string data with 1's, but leave the 0's in place. I've searched stackoverflow and elsewhere, to little avail.

数据大致如下所示,我只希望将其应用于以T_开头的列:

The data look roughly like this, where I only want this to apply to columns starting with T_:

    fol    T_opp    T_Dir    T_Enh   Activity
    1      0        0        vo      hf
    2      vr       0        0       hx
    2      0        0        0       fe
    3      0        bt       0       rn

我希望输出看起来一样,但是用两个分别替换为"vr","bt"和"vo"的整数1.据我所知,pd get_dummies函数不是我想要的为了.我也不能使用replace()来完成这项工作.我尝试了使用T/F掩码和零列表的方法,但是结果是如此错误,以至于我不会在这里发布代码.

I'd like the output to look the same, but with "vr" "bt" and "vo" each replaced with the integer 1. From what I can tell, the pd get_dummies function is not what I'm looking for. I also can't make this work with replace(). I tried something using a T/F mask and a list of zeros, but the outcome was so wrong I won't bother to post the code here.

我在上面的玩具数据中添加了一个附加列. 活动"列是一些我不想触摸的数据,也包括字符串.

Edited: I've added an additional column in the toy data above. The 'Activity' column is some data, also strings, that I do not want to touch.

推荐答案

另一种选择是相反的方法,首先转换为数字:

Another option is to do this the other way around, first convert to numeric:

In [11]: df.convert_objects(convert_numeric=True)
Out[11]: 
   fol  T_opp  T_Dir  T_Enh Activity
0    1      0      0    NaN       hf
1    2    NaN      0      0       hx
2    2      0      0      0       fe
3    3      0    NaN      0       rn

然后用1填写NaN:

In [12]: df.convert_objects(convert_numeric=True).fillna(1)
Out[12]: 
   fol  T_opp  T_Dir  T_Enh Activity
0    1      0      0      1       hf
1    2      1      0      0       hx
2    2      0      0      0       fe
3    3      0      1      0       rn

这篇关于用1替换列中的任何字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆