缺失数据,在 Pandas 中插入行并用 NAN 填充 [英] Missing data, insert rows in Pandas and fill with NAN
问题描述
我是 Python 和 Pandas 的新手,所以可能有一个我看不到的简单解决方案.
I'm new to Python and Pandas so there might be a simple solution which I don't see.
我有一些不连续的数据集,看起来像这样:
I have a number of discontinuous datasets which look like this:
ind A B C
0 0.0 1 3
1 0.5 4 2
2 1.0 6 1
3 3.5 2 0
4 4.0 4 5
5 4.5 3 3
我现在正在寻找一种解决方案来获得以下内容:
I now look for a solution to get the following:
ind A B C
0 0.0 1 3
1 0.5 4 2
2 1.0 6 1
3 1.5 NAN NAN
4 2.0 NAN NAN
5 2.5 NAN NAN
6 3.0 NAN NAN
7 3.5 2 0
8 4.0 4 5
9 4.5 3 3
问题是,A 中的差距在位置和长度上因数据集而异...
The problem is,that the gap in A varies from dataset to dataset in position and length...
推荐答案
set_index
和 reset_index
是你的朋友.
df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})
首先将 A 列移到索引处:
First move column A to the index:
In [64]: df.set_index("A")
Out[64]:
B C
A
0.0 1 3
0.5 4 2
1.0 6 1
3.5 2 0
4.0 4 5
4.5 3 3
然后用新的索引重新索引,这里缺失的数据用nans填充.我们使用 Index
对象,因为我们可以命名它;这将在下一步中使用.
Then reindex with a new index, here the missing data is filled in with nans. We use the Index
object since we can name it; this will be used in the next step.
In [66]: new_index = Index(arange(0,5,0.5), name="A")
In [67]: df.set_index("A").reindex(new_index)
Out[67]:
B C
0.0 1 3
0.5 4 2
1.0 6 1
1.5 NaN NaN
2.0 NaN NaN
2.5 NaN NaN
3.0 NaN NaN
3.5 2 0
4.0 4 5
4.5 3 3
最后使用 reset_index
将索引移回列.由于我们为索引命名,所以一切都神奇地工作:
Finally move the index back to the columns with reset_index
. Since we named the index, it all works magically:
In [69]: df.set_index("A").reindex(new_index).reset_index()
Out[69]:
A B C
0 0.0 1 3
1 0.5 4 2
2 1.0 6 1
3 1.5 NaN NaN
4 2.0 NaN NaN
5 2.5 NaN NaN
6 3.0 NaN NaN
7 3.5 2 0
8 4.0 4 5
9 4.5 3 3
这篇关于缺失数据,在 Pandas 中插入行并用 NAN 填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!