具有多索引的 Pandas 长格式到宽格式 [英] Pandas long to wide format with multi-index

查看:55
本文介绍了具有多索引的 Pandas 长格式到宽格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框:

I have a dataframe that looks like this:

data.head()
Out[2]: 
        Area Area Id                  Variable Name Variable Id  Year  \
0  Argentina       9  Conservation agriculture area        4454  1982   
1  Argentina       9  Conservation agriculture area        4454  1987   
2  Argentina       9  Conservation agriculture area        4454  1992   
3  Argentina       9  Conservation agriculture area        4454  1997   
4  Argentina       9  Conservation agriculture area        4454  2002   
     Value Symbol Md  
0      2.0            
1      6.0            
2    500.0       

我想旋转以便 Variable Name 是列,AreaYear 是索引和 Value 是值.对我来说最直观的方法是使用:

That I would like to pivot so that Variable Name is the columns, Area and Year are the index and Value are the values. The most intuitive way to me is using:

data.pivot(index=['Area', 'Year'], columns='Variable Name', values='Value)

但是我得到了错误:

Traceback (most recent call last):
  File "C:\Users\patri\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-4c786386b703>", line 1, in <module>
    pd.concat(data_list).pivot(index=['Area', 'Year'], columns='Variable Name', values='Value')
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\frame.py", line 3853, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 377, in pivot
    index=MultiIndex.from_arrays([index, self[columns]]))
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\series.py", line 250, in __init__
    data = SingleBlockManager(data, index, fastpath=True)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 4117, in __init__
    fastpath=True)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 2719, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 1844, in __init__
    placement=placement, **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\internals.py", line 115, in __init__
    len(self.mgr_locs)))
ValueError: Wrong number of items passed 119611, placement implies 2

我该如何解释?我也试过另一种方式:

How should I interpret this? I've also tried another way:

data.set_index(['Area', 'Variable Name', 'Year']).loc[:, 'Value'].unstack('Variable Name')

尝试获得相同的结果,但出现此错误:

to try to get the same result, but I get this error:

Traceback (most recent call last):
  File "C:\Users\patri\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-222325ea01e1>", line 1, in <module>
    pd.concat(data_list).set_index(['Area', 'Variable Name', 'Year']).loc[:, 'Value'].unstack('Variable Name')
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\series.py", line 2028, in unstack
    return unstack(self, level, fill_value)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 458, in unstack
    fill_value=fill_value)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 110, in __init__
    self._make_selectors()
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\reshape.py", line 148, in _make_selectors
    raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

数据有问题吗?我已经确认在数据帧的任何行中没有 AreaVariable NameYear 的重复组合,所以我不我认为应该有任何重复的条目,但我可能是错的.鉴于这两种方法目前都不起作用,如何从长格式转换为宽格式?我检查了答案这里此处,但它们都是涉及某种类型 I 聚合的情况.

Is there something wrong with the data? I've confirmed that there are no duplicate combinations of Area, Variable Name, and Year in any row of the dataframe, so I don't think there should be any duplicate entries but I could be wrong. How can I convert from long to wide format given that both of these methods are not currently working? I've checked answers here and here, but they are both cases where some type I aggregation is involved.

我试过像这样使用 pivot_table:

data.pivot_table(index=['Area', 'Year'], columns='Variable Name', values='Value')

但我认为正在进行某种类型的聚合,并且数据集中有很多缺失值导致此错误:

but I think some type of aggregation is being done and there are a lot of missing values in the dataset which leads to this error:

Traceback (most recent call last):
  File "C:\Users\patri\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-77b28d2f0dbb>", line 1, in <module>
    pd.concat(data_list).pivot_table(index=['Area', 'Year'], columns='Variable Name', values='Value')
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\reshape\pivot.py", line 136, in pivot_table
    agged = grouped.agg(aggfunc)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\groupby.py", line 4036, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\groupby.py", line 3468, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\base.py", line 435, in _aggregate
    **kwargs), None
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\base.py", line 391, in _try_aggregate_string_function
    return f(*args, **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\groupby.py", line 1037, in mean
    return self._cython_agg_general('mean', **kwargs)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\groupby.py", line 3354, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only)
  File "C:\Users\patri\Miniconda3\lib\site-packages\pandas\core\groupby.py", line 3425, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate

推荐答案

我认为您需要先将列 Value 转换为数字,然后使用 pivot_table 和默认聚合函数 <代码>平均值:

I think you need first convert column Value to numeric and then use pivot_table with default aggregate function mean:

#if all float data saved as strings
data['Value'] = data['Value'].astype(float)
#if some bad data like strings and first method return value error
data['Value'] = pd.to_numeric(data['Value'], errors='coerce')

<小时>

data.pivot_table(index=['Area', 'Year'], columns='Variable Name', values='Value')

或者:

data.groupby(['Area', 'Variable Name', 'Year'])[ 'Value'].mean().unstack('Variable Name')

这篇关于具有多索引的 Pandas 长格式到宽格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆