如果只有一列,为什么Pandas Transform失败 [英] Why Pandas Transform fails if you only have a single column

查看:148
本文介绍了如果只有一列,为什么Pandas Transform失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看此

After looking at this question I did some messing about and found this:

import pandas as pd

df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')

gives ValueError:

ValueError                                Traceback (most recent call last)
<ipython-input-38-157c6339ad93> in <module>()
      3 #df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4], 'b':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
      4 df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
----> 5 df['num_totals'] = df.groupby('a').transform('count')
      6 
      7 #df['num_totals']=df.groupby('a')[['a']].transform('count')

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2117         else:
   2118             # set column
-> 2119             self._set_item(key, value)
   2120 
   2121     def _setitem_slice(self, key, value):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2164         """
   2165         value = self._sanitize_column(key, value)
-> 2166         NDFrame._set_item(self, key, value)
   2167 
   2168     def insert(self, loc, column, value, allow_duplicates=False):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
    677 
    678     def _set_item(self, key, value):
--> 679         self._data.set(key, value)
    680         self._clear_item_cache()
    681 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in set(self, item, value)
   1779         except KeyError:
   1780             # insert at end
-> 1781             self.insert(len(self.items), item, value)
   1782 
   1783         self._known_consolidated = False

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in insert(self, loc, item, value, allow_duplicates)
   1793 
   1794             # new block
-> 1795             self._add_new_block(item, value, loc=loc)
   1796 
   1797         except:

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in _add_new_block(self, item, value, loc)
   1909             loc = self.items.get_loc(item)
   1910         new_block = make_block(value, self.items[loc:loc + 1].copy(),
-> 1911                                self.items, fastpath=True)
   1912         self.blocks.append(new_block)
   1913 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in make_block(values, items, ref_items, klass, fastpath, placement)
    964             klass = ObjectBlock
    965 
--> 966     return klass(values, items, ref_items, ndim=values.ndim, fastpath=fastpath, placement=placement)
    967 
    968 # TODO: flexible with index=None and/or items=None

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
     42         if len(items) != len(values):
     43             raise ValueError('Wrong number of items passed %d, indices imply %d'
---> 44                              % (len(items), len(values)))
     45 
     46         self.set_ref_locs(placement)

ValueError: Wrong number of items passed 1, indices imply 0

但是,如果我有2列,则可以正常工作:

But if I have 2 columns then it works fine:

df = pd.DataFrame({'a':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4],'b':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')
df



Out[40]:
    a  b  num_totals
0   1  1           4
1   1  1           4
2   1  1           4
3   1  1           4
4   2  2           2
5   2  2           2
6   3  3           3
7   3  3           3
8   3  3           3
9   4  4           7
10  4  4           7
11  4  4           7
12  4  4           7
13  4  4           7
14  4  4           7
15  4  4           7

或者如果我使用单列df执行此操作:

or if I do this using a single column df:

df['num_totals']=df.groupby('a')[['a']].transform('count')

有类似的 SO帖子,但目前尚不清楚我为什么在上面的示例中,一系列操作会失败并且一个数据框应该工作,为什么具有两列或更多列会工作.

There is a similar SO post but it is unclear to me why a series should fail and a dataframe should work in the immediate above example, and why having 2 or more columns would work.

我正在使用Python 2.7 64位和Pandas 0.12

I am using Python 2.7 64-bit and Pandas 0.12

推荐答案

DF中的单列

如上所述,这将返回与原始尺寸相同的系列

Single Column in the DF

As you noted above, this returns a series the same size as the original

In [32]: df.groupby('a')['a'].transform('count')
Out[32]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

但是,这正在重现空白帧

However, this is returing an empty frame

In [33]: df.groupby('a').transform('count')
Out[33]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

您不能将一个空框架作为一列分配给另一个框架,因为这本质上是一个模棱两可的分配(您可以证明它应该起作用")

you cannot assign a an empty frame as a column to another frame because this is essentially an ambiguous assignment (you can make a case that it should 'work' though)

两栏情况返回一个单列DataFrame

The two column case return a single-column DataFrame

In [42]: df2.groupby('a').transform('count')
Out[42]: 
    b
0   4
1   4
2   4
3   4
4   2
5   2
6   3
7   3
8   3
9   7
10  7
11  7
12  7
13  7
14  7
15  7

In [43]: type(df2.groupby('a').transform('count'))
Out[43]: pandas.core.frame.DataFrame

Or a series

In [45]: df2.groupby('a')['a'].transform('count')
Out[45]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

In [46]: type(df.groupby('a')['a'].transform('count'))
Out[46]: pandas.core.series.Series

之所以行之有效",是因为熊猫确实允许分配单个列框架,因为它将采用基础系列.

This 'works' because pandas DOES allow assignment of a single column frame to work, as it will take the underlying series.

因此,熊猫实际上正在尝试提供帮助.就是说,我发现这是一条不清楚的错误消息,用于尝试分配一个空框架.

So pandas is actually trying to be helpful. That said, I find this an unclear error message for trying to assign an empty frame.

这篇关于如果只有一列,为什么Pandas Transform失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆