替换pandas数据框中的值时出现str错误 [英] str error when replacing values in pandas dataframe
问题描述
我的代码从网站上抓取信息,并将其放入数据框.但是我不确定为什么代码的顺序会引起错误:AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
My code scrapes information from the website and puts it into a dataframe. But i'm not certain why the order of the code will give rise to the error: AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
基本上,抓取的数据超过20行10列.
Basically, the data scraped has over 20 rows and 10 columns.
- 某些值在方括号
ie: (2,333)
中,我想将其更改为:-2333
. - 某些值包含单词
n.a
,我想将其更改为numpy.nan
- 一些值是
-
,我也想将它们更改为numpy.nan
.
- Some values are within brackets
ie: (2,333)
and I want to change it to:-2333
. - Some values have words
n.a
and I want to change it tonumpy.nan
- some values are
-
and I want to change them tonumpy.nan
too.
不起作用
for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):
# Replacing necessary items for final clean up
final_df.replace('-', numpy.nan, inplace=True)
final_df.replace('n.a.', numpy.nan, inplace=True)
for i in final_df.columns:
final_df[i] = final_df[i].str.replace(')', '')
final_df[i] = final_df[i].str.replace(',', '')
final_df[i] = final_df[i].str.replace('(', '-')
# Appending Code to dataframe
final_df = final_df.T
final_df.insert(loc=0, column='Code', value=some_code)
# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
工程
for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):
# Replacing necessary items for final clean up
for i in final_df.columns:
final_df[i] = final_df[i].str.replace(')', '')
final_df[i] = final_df[i].str.replace(',', '')
final_df[i] = final_df[i].str.replace('(', '-')
final_df.replace('-', numpy.nan, inplace=True)
final_df.replace('n.a.', numpy.nan, inplace=True)
# Appending Code to dataframe
final_df = final_df.T
final_df.insert(loc=0, column='Code', value=some_code)
# This doesn't give me any errors and returns me what I want.
对为什么会发生这种情况有任何想法吗?
Any thoughts on why this happens?
推荐答案
For me works double replace
- first with regex=True
for replace substrings and second for all values:
np.random.seed(23)
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)),
columns=list('ABC'))
print (df)
A B C
0 2.34 - (2,333)
1 n.a. - (2,333)
2 2.34 n.a. (2,333)
df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan)
print(df1)
A B C
0 2.34 NaN -2333
1 NaN NaN -2333
2 2.34 NaN -2333
df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True)
print(df1)
A B C
0 2.34 NaN -2333
1 NaN NaN -2333
2 2.34 NaN -2333
Your error means you want replace some non string column (e.g. all columns are NaN
s in column B
) by str.replace
:
df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','')
.str.replace(',','')).replace(['-','n.a.'], np.nan)
print(df1)
A B C
0 2.34 NaN -2333
1 NaN NaN -2333
2 2.34 NaN -2333
df1 = df.replace(['-','n.a.'], np.nan)
.apply(lambda x: x.str.replace('\(','-')
.str.replace('\)','')
.str.replace(',',''))
print(df1)
AttributeError :(只能对带字符串值的.str访问器使用,在熊猫中使用np.object_ dtype",发生在索引B")
AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index B')
dtype
B
是float64
:
df1 = df.replace(['-','n.a.'], np.nan)
print(df1)
A B C
0 2.34 NaN (2,333)
1 NaN NaN (2,333)
2 2.34 NaN (2,333)
print (df1.dtypes)
A object
B float64
C object
dtype: object
这篇关于替换pandas数据框中的值时出现str错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!