将解决方案应用于实际数据时,结果不正确 [英] Incorrect results when applying solution to real data

查看:110
本文介绍了将解决方案应用于实际数据时,结果不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将此问题中提供的解决方案应用于我的真实数据:





注意:蓝框=应该返回的行,黄色框是连续的列值, 0(threshold)。

解决方案

根据您的评论逻辑,您正在寻找在2012年列中有各种值的行, 2013,2014,2015小于0或累计总和小于0.由于第二个条件为真时,第一个条件将始终为真,只需测试第二个条件。

  cols = ['2012','2013','2014','2015'] 
df.loc [(df [cols] .cumsum(axis = 1)< 0).all(axis = 1),cols]

2012 2013 2014 2015
1 -6.74 -1.22 1.58 -0.42
3 -3.14 -2.48 - 0.02 -4.78
4 -9.40 -11.20 0.68 12.04
7 -3.12 -5.74 0.84 1.94
8 -10.14 -12.24 -11.10 15.20
11 -10.04 -10.60 -5.56 -8.44
12 -7.30 5.96 -12.58 -6.86
15 -10.24 -4.16 5.46 -14.00

如果这不是你想要的,请在评论中知道。


I've tried to apply the solution provided in this question to my real data: Selecting rows in a MultiIndexed dataframe. Somehow I cannot get the results it should give. I've attached both the dataframe to select from, as well as the result.

What I need;

Rows 3, 11 AND 12 should be returned (when you add the 4 columns consecutively, 12 should be selected as well. It isn't now).

    df_test = pd.read_csv('df_test.csv')

    def find_window(df):
        v = df.values
        s = np.vstack([np.zeros((1, v.shape[1])), v.cumsum(0)])

        threshold = 0

        r, c = np.triu_indices(s.shape[0], 1)
        d = (c - r)[:, None]
        e = s[c] - s[r]
        mask = (e / d < threshold).all(1)
        rng = np.arange(mask.shape[0])

        if mask.any():
            idx = rng[mask][d[mask].argmax()]

            i0, i1 = r[idx], c[idx]
            return pd.DataFrame(
                v[i0:i1],
                df.loc[df.name].index[i0:i1],
                df.columns
            )

    cols = ['2012', '2013', '2014', '2015']

    df_test.groupby(level=0)[cols].apply(find_window)

csv_file is here: https://docs.google.com/spreadsheets/d/19oOoBdAs3xRBWq6HReizlqrkWoQR2159nk8GWoR_4-g/edit?usp=sharing

EDIT: Correct dataframes added.

Note: Blue frame = rows which should be returned, yellow frames is consecutive column values which are < 0 (threshold).

解决方案

According to the logic from your comment you are looking for rows that have every value in columns 2012,2013,2014,2015 less than 0 or have a cumulative sum less than 0. Since the first condition will always be true when the second condition is true you just test for the second condition.

cols = ['2012', '2013', '2014', '2015']
df.loc[(df[cols].cumsum(axis=1) < 0).all(axis=1), cols]

     2012   2013   2014   2015
1   -6.74  -1.22   1.58  -0.42
3   -3.14  -2.48  -0.02  -4.78
4   -9.40 -11.20   0.68  12.04
7   -3.12  -5.74   0.84   1.94
8  -10.14 -12.24 -11.10  15.20
11 -10.04 -10.60  -5.56  -8.44
12  -7.30   5.96 -12.58  -6.86
15 -10.24  -4.16   5.46 -14.00

Let me know in the comments if this is not what you want.

这篇关于将解决方案应用于实际数据时,结果不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆