根据Pandas中的堆栈列延长DataFrame的长度 [英] Lengthening a DataFrame based on stacking columns within it in Pandas

查看：116 发布时间：2020/5/18 20:23:15 python python-3.x pandas numpy dataframe

本文介绍了根据Pandas中的堆栈列延长DataFrame的长度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种实现以下目的的功能.最好在示例中显示.考虑:

I am looking for a function that achieves the following. It is best shown in an example. Consider:

pd.DataFrame([ [1, 2, 3 ], [4, 5, np.nan ]], columns=['x', 'y1', 'y2'])

如下所示:

   x  y1   y2
0  1   2  3
1  4   5  NaN

我想折叠y1和y2列，在必要时加长DataFame，以便输出为:

I would like to collapase the y1 and y2 columns, lengthening the DataFame where necessary, so that the output is:

即，x和y1或x和y2之间的每个组合对应一行.我正在寻找一个功能相对有效的函数，因为我有多个y和许多行.

That is, one row for each combination between either x and y1, or x and y2. I am looking for a function that does this relatively efficiently, as I have multiple ys and many rows.

推荐答案

这里是基于NumPy的，您正在寻找性能-

Here's one based on NumPy, as you were looking for performance -

def gather_columns(df):
    col_mask = [i.startswith('y') for i in df.columns]
    ally_vals = df.iloc[:,col_mask].values
    y_valid_mask = ~np.isnan(ally_vals)

    reps = np.count_nonzero(y_valid_mask, axis=1)
    x_vals = np.repeat(df.x.values, reps)
    y_vals = ally_vals[y_valid_mask]
    return pd.DataFrame({'x':x_vals, 'y':y_vals})

样品运行-

In [78]: df #(added more cols for variety)
Out[78]: 
   x  y1   y2   y5   y7
0  1   2  3.0  NaN  NaN
1  4   5  NaN  6.0  7.0

In [79]: gather_columns(df)
Out[79]: 
   x    y
0  1  2.0
1  1  3.0
2  4  5.0
3  4  6.0
4  4  7.0

如果y列始终从第二列开始直到结尾，我们可以简单地对数据帧进行切片，从而进一步提高性能，就像这样-

If the y columns are always starting from the second column onwards until the end, we can simply slice the dataframe and hence get further performance boost, like so -

def gather_columns_v2(df):
    ally_vals = df.iloc[:,1:].values
    y_valid_mask = ~np.isnan(ally_vals)

    reps = np.count_nonzero(y_valid_mask, axis=1)
    x_vals = np.repeat(df.x.values, reps)
    y_vals = ally_vals[y_valid_mask]
    return pd.DataFrame({'x':x_vals, 'y':y_vals})

这篇关于根据Pandas中的堆栈列延长DataFrame的长度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据Pandas中的堆栈列延长DataFrame的长度 [英] Lengthening a DataFrame based on stacking columns within it in Pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据Pandas中的堆栈列延长DataFrame的长度 [英] Lengthening a DataFrame based on stacking columns within it in Pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭