蟒蛇大 pandas 枢纽:如何做适当的提迪尔式传播? [英] python pandas pivot: How to do a proper tidyr-like spread?

查看:45
本文介绍了蟒蛇大 pandas 枢纽:如何做适当的提迪尔式传播?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,我缺少自发的,容易的,从长到宽的转换,反之亦然. 想象一下,我有一个大型整洁的数据框,其中包含许多属性列,并且单个列包含所有实际值,例如

I am missing spontaneous and easy conversion from long to wide and vice versa in Python. Imagine, I have a large tidy dataframe with a lot of property-columns and a single column that contains all the actual values like

PropA ... PropZ    Value
green     Saturn   400
green     Venus    3
red       Venus    2
.
.

通过保持数据整洁,可以很好地处理数据本身.但是有时我必须对某些属性执行一些操作(例如,比较蜂红色和蜂绿色(对于与其他属性相似的所有项目)可能会很有趣). 因此,直接的方法是使它尽可能整洁,并且仅使我感兴趣的某些属性(PropA)杂乱无章.随后,我可以使用所需的任何功能执行逐行映射,还可以创建一个包含功能输出的附加PropA条目.

The data itself is very nicely handled by keeping it tidy. But sometimes I have to perform some actions across certain properties (for instance it might be interesting to compare beeing red vs beeing green (for all the items that are similar w.r.t the other properties)). So the straight-forward way would be to keep it tidy as much as possible and only untidy the certain property which i am interested in (PropA). Subsequently, I could perform a row-wise map with whatever function I desire and I could create an additional PropA-Entry which contains the function-ouput.

但是,在Python中保持所有其他属性的整齐性并不像我以前使用R那样容易.原因是,所有非关键属性都随我发现的所有pd方法一起移交给了索引.如果我想保留更多列,那就太麻烦了.

However, keeping all the other properties tidy is not as easy in Python as I was used to by using R. The reason is, all the not-pivotal properties are surrendered to the index with all the pd-methods I found. Thats a whole mess if I have some more columns I want to keep.

那么您如何处理这个问题.有没有其他巧妙的方法可以解决这类问题?

So how do you get along with this problem. Is there some neat other way of dealing with those type of problems?

我本人已经编写了一种扩展方法,但是速度太慢了.也许,您对我的改进有一些想法.

I have written a spread-method myself, but it is awefully slow. Maybe, you ahve some ideas how I can improve.

#the idea is to group by the remaining properties, which should be left in the long format.
#then i spread the small tidy data table for any group
    @staticmethod
    def spread(df, propcol, valcol):
        def flip(data,pc,vc):
            data = data.reset_index(drop=True)
            return {data[pc][i]:[data[vc][i]] for i in range(0,len(data))}

        #index columns are all which are not affected
        indcols = list(df.columns)
        indcols.remove(propcol)
        indcols.remove(valcol)

        tmpdf=pd.DataFrame()
        for key, group in df.groupby(indcols):
            dc1 = {a:[b] for (a,b) in zip(indcols,key)}
            dc2 = flip(group,propcol,valcol)
            tmpdf = pd.concat([tmpdf,pd.concat([pd.DataFrame(dc1),pd.DataFrame(dc2)],axis=1)])

        return tmpdf.reset_index(drop = True)

推荐答案

在提示的帮助下,我创建了一个简单的版本. 我仍然对索引机制有些困惑,但是时间可以帮助我更好地理解索引.

with help of the hint, i've created a simplier version. i am still a little confused with the index mechanic, but time will help me get a better understanding.

def spread(df, propcol, valcol):
    indcol = list(df.columns.drop(valcol))
    df = df.set_index(indcol).unstack(propcol).reset_index()
    df.columns = [i[1] if i[0] == valcol else i[0] for i in df.columns]
    return df

这篇关于蟒蛇大 pandas 枢纽:如何做适当的提迪尔式传播?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆