根据Pandas中的堆栈列延长DataFrame的长度 [英] Lengthening a DataFrame based on stacking columns within it in Pandas

查看:116
本文介绍了根据Pandas中的堆栈列延长DataFrame的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种实现以下目的的功能.最好在示例中显示.考虑:

I am looking for a function that achieves the following. It is best shown in an example. Consider:

pd.DataFrame([ [1, 2, 3 ], [4, 5, np.nan ]], columns=['x', 'y1', 'y2'])

如下所示:

   x  y1   y2
0  1   2  3
1  4   5  NaN

我想折叠y1y2列,在必要时加长DataFame,以便输出为:

I would like to collapase the y1 and y2 columns, lengthening the DataFame where necessary, so that the output is:

   x  y
0  1   2  
1  1   3  
2  4   5  

即,xy1xy2之间的每个组合对应一行.我正在寻找一个功能相对有效的函数,因为我有多个y和许多行.

That is, one row for each combination between either x and y1, or x and y2. I am looking for a function that does this relatively efficiently, as I have multiple ys and many rows.

推荐答案

这里是基于NumPy的,您正在寻找性能-

Here's one based on NumPy, as you were looking for performance -

def gather_columns(df):
    col_mask = [i.startswith('y') for i in df.columns]
    ally_vals = df.iloc[:,col_mask].values
    y_valid_mask = ~np.isnan(ally_vals)

    reps = np.count_nonzero(y_valid_mask, axis=1)
    x_vals = np.repeat(df.x.values, reps)
    y_vals = ally_vals[y_valid_mask]
    return pd.DataFrame({'x':x_vals, 'y':y_vals})

样品运行-

In [78]: df #(added more cols for variety)
Out[78]: 
   x  y1   y2   y5   y7
0  1   2  3.0  NaN  NaN
1  4   5  NaN  6.0  7.0

In [79]: gather_columns(df)
Out[79]: 
   x    y
0  1  2.0
1  1  3.0
2  4  5.0
3  4  6.0
4  4  7.0

如果y列始终从第二列开始直到结尾,我们可以简单地对数据帧进行切片,从而进一步提高性能,就像这样-

If the y columns are always starting from the second column onwards until the end, we can simply slice the dataframe and hence get further performance boost, like so -

def gather_columns_v2(df):
    ally_vals = df.iloc[:,1:].values
    y_valid_mask = ~np.isnan(ally_vals)

    reps = np.count_nonzero(y_valid_mask, axis=1)
    x_vals = np.repeat(df.x.values, reps)
    y_vals = ally_vals[y_valid_mask]
    return pd.DataFrame({'x':x_vals, 'y':y_vals})

这篇关于根据Pandas中的堆栈列延长DataFrame的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆