Pandas DataFrame构造函数在包含index参数时引入了NaN [英] Pandas DataFrame constructor introduces NaN when including the index argument
问题描述
我正在使用DataFrame构造函数创建一个熊猫DataFrame对象.我的数据是列表和分类数据Series对象的字典.当我将索引传递给构造函数时,我的分类数据系列将被NaN值重置.这里发生了什么?预先感谢!
I'm creating a pandas DataFrame object using the DataFrame constructor. My data is a dict of lists and categorical data Series objects. When I pass an index to the constructor, my categorical data series gets reset with NaN values. What's going on here? Thanks in advance!
示例:
import pandas as pd
import numpy as np
a = pd.Series(['a','b','c'],dtype="category")
b = pd.Series(['a','b','c'],dtype="object")
c = pd.Series(['a','b','cc'],dtype="object")
A = pd.DataFrame({'A':a,'B':[1,2,3]},index=["0","1","2"])
AA = pd.DataFrame({'A':a,'B':[1,2,3]})
B = pd.DataFrame({'A':b,'C':[4,5,6]})
print("DF A:")
print(A)
print("\nDF A, without specifying an index in the constructor:")
print(AA)
print("\nDF B:")
print(B)
推荐答案
这与类别与对象无关,与索引对齐有关.
This doesn't have anything to do with categories vs. object, it has to do with index alignment.
您在A中得到NaN,是因为您告诉构造函数您想要三个字符串的索引.但是a
有其自己的索引,该索引由整数[0, 1, 2]
组成.由于这与您想要的索引不匹配,因此数据无法对齐,因此您将获得一个带有您想要的索引的DataFrame,并且NaN会突出显示该数据丢失.相比之下,B
只是一个列表,因此没有要忽略的索引,因此它假定数据是以适当的索引顺序给出的.
You're getting NaNs in A because you're telling the constructor you want an index of three strings. But a
has an index of its own, consisting of the integers [0, 1, 2]
. Since that doesn't match the index you've said you want, the data doesn't align, and so you get a DataFrame with the index you said you wanted and the NaNs highlight that the data is missing. By contrast, B
is simply a list, and so there's no index to ignore, and accordingly it assumes the data is given in index-appropriate order.
看起来比解释起来容易.无论dtype如何,如果索引不匹配,您都会得到NaN:
This might be easier to see than to explain. Regardless of dtype, if the indices don't match, you get NaN:
In [147]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="category"),'B':[1,2,3]},
index=["0","1","2"])
Out[147]:
A B
0 NaN 1
1 NaN 2
2 NaN 3
In [148]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
index=["0","1","2"])
Out[148]:
A B
0 NaN 1
1 NaN 2
2 NaN 3
如果您使用完全匹配的索引,它将起作用:
If you use a fully-matching index, it works:
In [149]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
index=[0,1,2])
Out[149]:
A B
0 a 1
1 b 2
2 c 3
如果您使用部分匹配的索引,则会在索引对齐的地方得到值,而在索引不对齐的地方得到NaN:
And if you use a partially-matching index, you'll get values where the indices align and NaN where they don't:
In [150]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
index=[0,1,10])
Out[150]:
A B
0 a 1
1 b 2
10 NaN 3
这篇关于Pandas DataFrame构造函数在包含index参数时引入了NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!