Pandas DataFrame构造函数在包含index参数时引入了NaN [英] Pandas DataFrame constructor introduces NaN when including the index argument

查看:197
本文介绍了Pandas DataFrame构造函数在包含index参数时引入了NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用DataFrame构造函数创建一个熊猫DataFrame对象.我的数据是列表和分类数据Series对象的字典.当我将索引传递给构造函数时,我的分类数据系列将被NaN值重置.这里发生了什么?预先感谢!

I'm creating a pandas DataFrame object using the DataFrame constructor. My data is a dict of lists and categorical data Series objects. When I pass an index to the constructor, my categorical data series gets reset with NaN values. What's going on here? Thanks in advance!

示例:

import pandas as pd
import numpy as np
a = pd.Series(['a','b','c'],dtype="category")
b = pd.Series(['a','b','c'],dtype="object")
c = pd.Series(['a','b','cc'],dtype="object")

A = pd.DataFrame({'A':a,'B':[1,2,3]},index=["0","1","2"])
AA = pd.DataFrame({'A':a,'B':[1,2,3]})
B = pd.DataFrame({'A':b,'C':[4,5,6]})    

print("DF A:")
print(A)
print("\nDF A, without specifying an index in the constructor:")
print(AA)
print("\nDF B:")
print(B)

推荐答案

这与类别与对象无关,与索引对齐有关.

This doesn't have anything to do with categories vs. object, it has to do with index alignment.

您在A中得到NaN,是因为您告诉构造函数您想要三个字符串的索引.但是a有其自己的索引,该索引由整数[0, 1, 2]组成.由于这与您想要的索引不匹配,因此数据无法对齐,因此您将获得一个带有您想要的索引的DataFrame,并且NaN会突出显示该数据丢失.相比之下,B只是一个列表,因此没有要忽略的索引,因此它假定数据是以适当的索引顺序给出的.

You're getting NaNs in A because you're telling the constructor you want an index of three strings. But a has an index of its own, consisting of the integers [0, 1, 2]. Since that doesn't match the index you've said you want, the data doesn't align, and so you get a DataFrame with the index you said you wanted and the NaNs highlight that the data is missing. By contrast, B is simply a list, and so there's no index to ignore, and accordingly it assumes the data is given in index-appropriate order.

看起来比解释起来容易.无论dtype如何,如果索引不匹配,您都会得到NaN:

This might be easier to see than to explain. Regardless of dtype, if the indices don't match, you get NaN:

In [147]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="category"),'B':[1,2,3]},
          index=["0","1","2"])
Out[147]: 
     A  B
0  NaN  1
1  NaN  2
2  NaN  3

In [148]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=["0","1","2"])
Out[148]: 
     A  B
0  NaN  1
1  NaN  2
2  NaN  3

如果您使用完全匹配的索引,它将起作用:

If you use a fully-matching index, it works:

In [149]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=[0,1,2])
Out[149]: 
   A  B
0  a  1
1  b  2
2  c  3

如果您使用部分匹配的索引,则会在索引对齐的地方得到值,而在索引不对齐的地方得到NaN:

And if you use a partially-matching index, you'll get values where the indices align and NaN where they don't:

In [150]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=[0,1,10])
Out[150]: 
      A  B
0     a  1
1     b  2
10  NaN  3

这篇关于Pandas DataFrame构造函数在包含index参数时引入了NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆