带有NA值的Pandas Dataframe抛出ValueError [英] Pandas Dataframe with NA values throwing ValueError

查看:1949
本文介绍了带有NA值的Pandas Dataframe抛出ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大熊猫中有一个数据框,看起来像这样

I have a dataframe in pandas that looks like this

df.head(2)
Out[25]: 
                                   CompanyName Region MachineType
recvd_dttm                                                    
2014-07-13 12:40:40     Company1    NA    Machine1
2014-07-13 15:31:39     Company2    NA    Machine2

我首先在某个日期范围内获取数据,然后尝试获取区域NA中的数据,是MachineType Machine1。

I am first taking data in a certain date range, then trying to get data that is in the Region NA and is MachineType Machine1.

然而,我不断得到这个错误: ValueError:长度不匹配:期望的轴有4个元素,新值有3个元素

However, I keep getting this error: ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements

此代码一直工作,直到我添加了region列并使用这一行: df = df [(df ['Region'] == 'NA')& (df ['CallType'] =='Optia')]

This code worked until I added the region column and used this line: df = df[(df['Region']=='NA') & (df['CallType']=='Optia')]

因为最初NA(NorthAmerica)的数据正在读取作为NaN,我在read_csv命令中使用了 keep_default_na = False

Because at first the data for NA (NorthAmerica) was being read in as NaN, I used keep_default_na=False in my read_csv command.

但是,我以这种方式做了一个pivot_table

However, I made a pivot_table this way

result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']

pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)

而且,在result.columns行出现错误尽管如果可能fillna(0)命令正在起作用,我也不会感到惊讶,因为实际上应该是 NA 值> NaN ,而不是NorthAmerica。

And the error is coming up at the result.columns line, though I wouldn't be surprised if perhaps the fillna(0) command is acting up, as there were other NA values that were actually supposed to be NaN , not NorthAmerica.

如何修复ValueError并避免NA混淆?

How do I fix the ValueError and avoid NA confusion?

推荐答案

这是你可以做什么仅在一列中放置 NaN

Here's what you can do to replace the NaN in one column only:

import pandas as pd
import numpy as np

df = pd.read_clipboard()
print df

# I created a test column
           recvd_dttm CompanyName  Region MachineType  Test
2014-07-13   12:40:40    Company1     NaN    Machine1   NaN
2014-07-13   15:31:39    Company2     NaN    Machine2   NaN

df['Region'] = df['Region'].replace(np.NaN, 'NorthAm')
print df

           recvd_dttm CompanyName   Region MachineType  Test
2014-07-13   12:40:40    Company1  NorthAm    Machine1   NaN
2014-07-13   15:31:39    Company2  NorthAm    Machine2   NaN

这篇关于带有NA值的Pandas Dataframe抛出ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆