带有NA值的Pandas Dataframe抛出ValueError [英] Pandas Dataframe with NA values throwing ValueError
问题描述
我在大熊猫中有一个数据框,看起来像这样
I have a dataframe in pandas that looks like this
df.head(2)
Out[25]:
CompanyName Region MachineType
recvd_dttm
2014-07-13 12:40:40 Company1 NA Machine1
2014-07-13 15:31:39 Company2 NA Machine2
我首先在某个日期范围内获取数据,然后尝试获取区域NA中的数据,是MachineType Machine1。
I am first taking data in a certain date range, then trying to get data that is in the Region NA and is MachineType Machine1.
然而,我不断得到这个错误: ValueError:长度不匹配:期望的轴有4个元素,新值有3个元素
However, I keep getting this error: ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements
此代码一直工作,直到我添加了region列并使用这一行: df = df [(df ['Region'] == 'NA')& (df ['CallType'] =='Optia')]
This code worked until I added the region column and used this line: df = df[(df['Region']=='NA') & (df['CallType']=='Optia')]
因为最初NA(NorthAmerica)的数据正在读取作为NaN,我在read_csv命令中使用了 keep_default_na = False
。
Because at first the data for NA (NorthAmerica) was being read in as NaN, I used keep_default_na=False
in my read_csv command.
但是,我以这种方式做了一个pivot_table
However, I made a pivot_table this way
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']
pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)
而且,在result.columns行出现错误尽管如果可能fillna(0)命令正在起作用,我也不会感到惊讶,因为实际上应该是 NA
值> NaN ,而不是NorthAmerica。
And the error is coming up at the result.columns line, though I wouldn't be surprised if perhaps the fillna(0) command is acting up, as there were other NA
values that were actually supposed to be NaN
, not NorthAmerica.
如何修复ValueError并避免NA混淆?
How do I fix the ValueError and avoid NA confusion?
推荐答案
这是你可以做什么仅在一列中放置 NaN
:
Here's what you can do to replace the NaN
in one column only:
import pandas as pd
import numpy as np
df = pd.read_clipboard()
print df
# I created a test column
recvd_dttm CompanyName Region MachineType Test
2014-07-13 12:40:40 Company1 NaN Machine1 NaN
2014-07-13 15:31:39 Company2 NaN Machine2 NaN
df['Region'] = df['Region'].replace(np.NaN, 'NorthAm')
print df
recvd_dttm CompanyName Region MachineType Test
2014-07-13 12:40:40 Company1 NorthAm Machine1 NaN
2014-07-13 15:31:39 Company2 NorthAm Machine2 NaN
这篇关于带有NA值的Pandas Dataframe抛出ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!