带有NA值的Pandas Dataframe抛出ValueError [英] Pandas Dataframe with NA values throwing ValueError

查看：1949 发布时间：2017/3/26 4:46:16 python pandas dataframe pivot-table na

本文介绍了带有NA值的Pandas Dataframe抛出ValueError的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在大熊猫中有一个数据框，看起来像这样

I have a dataframe in pandas that looks like this

df.head(2)
Out[25]: 
                                   CompanyName Region MachineType
recvd_dttm                                                    
2014-07-13 12:40:40     Company1    NA    Machine1
2014-07-13 15:31:39     Company2    NA    Machine2

我首先在某个日期范围内获取数据，然后尝试获取区域NA中的数据，是MachineType Machine1。

I am first taking data in a certain date range, then trying to get data that is in the Region NA and is MachineType Machine1.

然而，我不断得到这个错误： ValueError：长度不匹配：期望的轴有4个元素，新值有3个元素

However, I keep getting this error: ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements

此代码一直工作，直到我添加了region列并使用这一行： df = df [（df ['Region'] == 'NA'）& （df ['CallType'] =='Optia'）]

This code worked until I added the region column and used this line: df = df[(df['Region']=='NA') & (df['CallType']=='Optia')]

因为最初NA（NorthAmerica）的数据正在读取作为NaN，我在read_csv命令中使用了 keep_default_na = False 。

Because at first the data for NA (NorthAmerica) was being read in as NaN, I used keep_default_na=False in my read_csv command.

但是，我以这种方式做了一个pivot_table

However, I made a pivot_table this way

result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg(len).reset_index()
result.columns = ['Month', 'CompanyName', 'NumberCalls']

pivot_table = result.pivot(index='Month', columns='CompanyName', values='NumberCalls').fillna(0)

而且，在result.columns行出现错误尽管如果可能fillna（0）命令正在起作用，我也不会感到惊讶，因为实际上应该是 NA 值> NaN ，而不是NorthAmerica。

And the error is coming up at the result.columns line, though I wouldn't be surprised if perhaps the fillna(0) command is acting up, as there were other NA values that were actually supposed to be NaN , not NorthAmerica.

如何修复ValueError并避免NA混淆？

How do I fix the ValueError and avoid NA confusion?

推荐答案

这是你可以做什么仅在一列中放置 NaN ：

Here's what you can do to replace the NaN in one column only:

import pandas as pd
import numpy as np

df = pd.read_clipboard()
print df

# I created a test column
           recvd_dttm CompanyName  Region MachineType  Test
2014-07-13   12:40:40    Company1     NaN    Machine1   NaN
2014-07-13   15:31:39    Company2     NaN    Machine2   NaN

df['Region'] = df['Region'].replace(np.NaN, 'NorthAm')
print df

           recvd_dttm CompanyName   Region MachineType  Test
2014-07-13   12:40:40    Company1  NorthAm    Machine1   NaN
2014-07-13   15:31:39    Company2  NorthAm    Machine2   NaN

这篇关于带有NA值的Pandas Dataframe抛出ValueError的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有NA值的Pandas Dataframe抛出ValueError [英] Pandas Dataframe with NA values throwing ValueError

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有NA值的Pandas Dataframe抛出ValueError [英] Pandas Dataframe with NA values throwing ValueError

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭