ivot_table索引中的NaN值导致数据丢失 [英] NaN values in pivot_table index causes loss of data
问题描述
这是一个简单的DataFrame:
Here is a simple DataFrame:
> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],
'b': ['optional1', None, 'optional3'],
'c': ['c1', 'c2', 'c3'],
'd': [1, 2, 3]})
> df
a b c d
0 a1 optional1 c1 1
1 a2 None c2 2
2 a3 optional3 c3 3
数据透视方法1
数据可以透视到此:
Pivot method 1
The data can be pivoted to this:
> df.pivot_table(index=['a','b'], columns='c')
d
c c1 c3
a b
a1 optional1 1.0 NaN
a3 optional3 NaN 3.0
缺点:由于df['b'][1] == None
,第二行中的数据丢失.
Downside: data in the 2nd row is lost because df['b'][1] == None
.
> df.pivot_table(index=['a'], columns='c')
d
c c1 c2 c3
a
a1 1.0 NaN NaN
a2 NaN 2.0 NaN
a3 NaN NaN 3.0
缺点:b
列丢失.
如何将这两种方法结合起来,以使列b
和第二行保持如下:
How can the two methods be combined so that columns b
and the 2nd row are kept like so:
d
c c1 c2 c3
a b
a1 optional1 1.0 NaN NaN
a2 None NaN 2.0 NaN
a3 optional3 NaN NaN 3.0
更笼统:如果键具有NaN
值,在旋转期间如何保留行中的信息?
More generally: How can information from a row be retained during pivoting if a key has NaN
value?
推荐答案
使用set_index
和unstack
进行数据透视:
df = df.set_index(['a', 'b', 'c']).unstack('c')
这基本上就是熊猫在 pivot
的引擎盖. stack
和unstack
方法与pivot
密切相关,通常可用于执行与内置的透视功能完全不一致的类似于透视的操作.
This is essentially what pandas does under the hood for pivot
. The stack
and unstack
methods are closely related to pivot
, and can generally be used to perform pivot-like operations that don't quite conform with the built-in pivot functions.
结果输出:
d
c c1 c2 c3
a b
a1 optional1 1.0 NaN NaN
a2 NaN NaN 2.0 NaN
a3 optional3 NaN NaN 3.0
这篇关于ivot_table索引中的NaN值导致数据丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!