将解决方案应用于实际数据时，结果不正确 [英] Incorrect results when applying solution to real data

查看：110 发布时间：2017/3/26 0:28:54 pandas numpy dataframe

本文介绍了将解决方案应用于实际数据时，结果不正确的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将此问题中提供的解决方案应用于我的真实数据：

注意：蓝框=应该返回的行，黄色框是连续的列值， 0（threshold）。

解决方案

根据您的评论逻辑，您正在寻找在2012年列中有各种值的行， 2013,2014,2015小于0或累计总和小于0.由于第二个条件为真时，第一个条件将始终为真，只需测试第二个条件。

  cols = ['2012'，'2013'，'2014'，'2015'] 
 df.loc [（df [cols] .cumsum（axis = 1）< 0）.all（axis = 1），cols] 
 
 2012 2013 2014 2015 
 1 -6.74 -1.22 1.58 -0.42 
 3 -3.14 -2.48  - 0.02 -4.78 
 4 -9.40 -11.20 0.68 12.04 
 7 -3.12 -5.74 0.84 1.94 
 8 -10.14 -12.24 -11.10 15.20 
 11 -10.04 -10.60 -5.56 -8.44 
 12 -7.30 5.96 -12.58 -6.86 
 15 -10.24 -4.16 5.46 -14.00

如果这不是你想要的，请在评论中知道。

I've tried to apply the solution provided in this question to my real data: Selecting rows in a MultiIndexed dataframe. Somehow I cannot get the results it should give. I've attached both the dataframe to select from, as well as the result.

What I need;

Rows 3, 11 AND 12 should be returned (when you add the 4 columns consecutively, 12 should be selected as well. It isn't now).

    df_test = pd.read_csv('df_test.csv')

    def find_window(df):
        v = df.values
        s = np.vstack([np.zeros((1, v.shape[1])), v.cumsum(0)])

        threshold = 0

        r, c = np.triu_indices(s.shape[0], 1)
        d = (c - r)[:, None]
        e = s[c] - s[r]
        mask = (e / d < threshold).all(1)
        rng = np.arange(mask.shape[0])

        if mask.any():
            idx = rng[mask][d[mask].argmax()]

            i0, i1 = r[idx], c[idx]
            return pd.DataFrame(
                v[i0:i1],
                df.loc[df.name].index[i0:i1],
                df.columns
            )

    cols = ['2012', '2013', '2014', '2015']

    df_test.groupby(level=0)[cols].apply(find_window)

csv_file is here: https://docs.google.com/spreadsheets/d/19oOoBdAs3xRBWq6HReizlqrkWoQR2159nk8GWoR_4-g/edit?usp=sharing

EDIT: Correct dataframes added.

Note: Blue frame = rows which should be returned, yellow frames is consecutive column values which are < 0 (threshold).

解决方案

According to the logic from your comment you are looking for rows that have every value in columns 2012,2013,2014,2015 less than 0 or have a cumulative sum less than 0. Since the first condition will always be true when the second condition is true you just test for the second condition.

cols = ['2012', '2013', '2014', '2015']
df.loc[(df[cols].cumsum(axis=1) < 0).all(axis=1), cols]

     2012   2013   2014   2015
1   -6.74  -1.22   1.58  -0.42
3   -3.14  -2.48  -0.02  -4.78
4   -9.40 -11.20   0.68  12.04
7   -3.12  -5.74   0.84   1.94
8  -10.14 -12.24 -11.10  15.20
11 -10.04 -10.60  -5.56  -8.44
12  -7.30   5.96 -12.58  -6.86
15 -10.24  -4.16   5.46 -14.00

Let me know in the comments if this is not what you want.

这篇关于将解决方案应用于实际数据时，结果不正确的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将解决方案应用于实际数据时，结果不正确 [英] Incorrect results when applying solution to real data

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将解决方案应用于实际数据时，结果不正确 [英] Incorrect results when applying solution to real data

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭