pandas 带功能也删除数值 [英] Pandas strip function removes numeric values as well
问题描述
我有一个数据框,可以从下面的代码中生成
I have a dataframe which can be generated from the code below
data_file= pd.DataFrame({'studyid':[1,2,3],'age_interview': [' 56','57 ','55'],'ethnicity': ['Chinese','Indian','European'],'Marital_status': ['Single','Married','Widowed'],'Smoke_status':['Yes','No','No']})
创建以上数据框后,我将其融化并应用strip函数
Once I create the above dataframe, I melt it and apply the strip function
obs = data_file.melt('studyid', value_name='valuestring').sort_values('studyid')
obs['valuestring'].str.strip()
尽管在示例数据中效果很好,但在实际数据中,它也会删除数值.我遵循与上面相同的代码,但是数据不同.
Though it works fine in the sample data, in real data it removes the numeric value as well. I follow the same code as above but just the data is different.
请找到剥离功能之前和之后的屏幕截图
Please find the screenshots of before and after strip function
在"obs ['valuestring'].str.strip()"之前输出
"obs ['valuestring'].str.strip()"之后的输出
如何防止删除数值?
推荐答案
看起来您的列包含混合的整数和字符串.这是一个可重现的示例:
It looks like your column has mixed integers and strings. Here's a reproducible example:
s = pd.Series([1, np.nan, 'abc ', 2.0, ' def '])
s.str.strip()
0 NaN
1 NaN
2 abc
3 NaN
4 def
dtype: object
如果该值不是字符串,则将其隐式处理为NaN.
If the value is not string, it is implicitly handled as NaN.
解决方案是在调用strip之前将列及其所有值转换为字符串.
The solution is to convert the column and all its values to string before calling strip.
s.astype(str).str.strip()
0 1
1 nan
2 abc
3 2.0
4 def
dtype: object
您的情况应该是
obs['valuestring'] = obs['valuestring'].astype(str).str.strip()
请注意,如果要保留NaN,请在末尾使用mask
.
s.astype(str).str.strip().mask(s.isna())
0 1
1 NaN
2 abc
3 2.0
4 def
dtype: object
这篇关于 pandas 带功能也删除数值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!