我如何将缺失的行插入到这个数据集中? [英] How would I insert missing rows into this data set?

查看:42
本文介绍了我如何将缺失的行插入到这个数据集中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是在缺少一行时将记录插入到数据集中.

如果您查看上面的数据集,它包含 3 列属性和 2 个数值.第三列 TTF 是增量的,不应跳过任何值.在此示例中,它缺少显示在底部的 2 行.所以我希望我的代码做的是将这 2 行插入到结果集中(即计算机 - 显示器缺少 5 的 TTF,而电视 - 电源缺少 6 的 TTF.我将修复值设置为 0,并且运行总值与前一行相同).

我想我会通过拆分列名并递归遍历前 2 个,然后从 1 到 8 个遍历第三个来处理它.

for i in range(len(Product)):对于范围内的 j(len(Module)):对于范围 (1, 8) 中的 k:# 检查Repair 值是否存在,如果不存在则设为0# 如果修复值缺失,查找之前的运行总数

这看起来是最好的方法吗?对实现此目的的实际代码的任何帮助将不胜感激.

这是 DF 中的代码读取,因为根据 excel 屏幕截图,这似乎令人困惑.

<预><代码>>>>将熊猫导入为 pd>>>>>>df = pd.read_csv('minimal.csv')>>>>>>df产品模块 TTF 维修运行总计0 电脑显示器 1 3 31 电脑显示器 2 2 52 电脑显示器 3 1 63 电脑显示器 4 5 114 电脑显示器 6 4 155 电脑显示器 7 3 186 电脑显示器 8 2 207 电视电源 1 7 78 电视电源 2 6 139 电视电源 3 4 1710 电视电源 4 5 2211 电视电源 5 6 2812 电视电源 7 7 3513 电视电源 8 8 43

解决方案

让我们使用 reindexnp.arange 为缺失的数字创建新的 TTF:>

df = pd.DataFrame({'Product':['Computer']*7 + ['Television']*7,'Module':['Display']*7 + ['Power Supply']*7,'TTF':[1,2,3,4,6,7,8,1,2,3,4,5,7,8],'修复':np.random.randint(1,8,14)})df['运行总计'] = df['修复'].cumsum()打印(df)

输入数据框:

 模块产品维修 TTF 运行总计0 显示计算机 6 1 61 显示计算机 2 2 82 显示计算机 2 3 103 显示计算机 4 4 144 显示计算机 2 6 165 显示计算机 3 7 196 显示计算机 6 8 257 电源电视 3 1 288 电源电视 3 2 319 电源电视 5 3 3610 电源电视 6 4 4211 电源电视 4 5 4612 电源电视 2 7 4813 电源电视 2 8 50df_out = df.set_index('TTF').groupby(['Product','Module'], group_keys=False).apply(lambda x: x.reindex(np.arange(1,9)))df_out['修复'] = df_out['修复'].fillna(0)df_out = df_out.ffill().reset_index()打印(df_out)

输出:

 TTF 模块产品维修 运行 全面维修0 1 显示计算机 6.0 6.0 6.01 2 显示计算机 2.0 8.0 2.02 3 显示计算机 2.0 10.0 2.03 4 显示计算机 4.0 14.0 4.04 5 显示计算机 4.0 14.0 0.05 6 显示计算机 2.0 16.0 2.06 7 显示计算机 3.0 19.0 3.07 8 显示计算机 6.0 25.0 6.08 1 电源电视 3.0 28.0 3.09 2 电源 电视 3.0 31.0 3.010 3 电源电视 5.0 36.0 5.011 4 电源电视 6.0 42.0 6.012 5 电源 电视 4.0 46.0 4.013 6 电源电视 4.0 46.0 0.014 7 电源 电视 2.0 48.0 2.015 8 电源 电视 2.0 50.0 2.0

What I am trying to do is insert records into a dataset whenever a line is missing.

If you look at the data set above, it contains 3 columns of attributes and then 2 numeric values. The third column TTF, is incremental and should not skip any values. In this example it is missing 2 rows which are shown at the bottom. So what I want my code to do would be insert those 2 rows into the result set (i.e. Computer - Display is missing TTF of 5, and Television - Power Supply is missing TTF of 6. I would set the repair value to 0, and the running total value to the same as the previous row).

I was thinking I would approach it by splitting the column names and recursively walking through the first 2, and then 1 to 8 for the third.

for i in range(len(Product)):
    for j in range(len(Module)):
        for k in range(1, 8):  
            # Check if the Repair value is there if not make it 0
            # If Repair value is missing, look up previous Running Total

Does this seem like the best approach? Any help with the actual code to accomplish this would really be appreciated.

EDIT: Here is code reading in the DF, since that seems to be confusing based on the excel screenshot.

>>> import pandas as pd
>>> 
>>> df = pd.read_csv('minimal.csv')
>>> 
>>> df
       Product         Module   TTF   Repair   Running Total
0     Computer        Display     1        3               3
1     Computer        Display     2        2               5
2     Computer        Display     3        1               6
3     Computer        Display     4        5              11
4     Computer        Display     6        4              15
5     Computer        Display     7        3              18
6     Computer        Display     8        2              20
7   Television   Power Supply     1        7               7
8   Television   Power Supply     2        6              13
9   Television   Power Supply     3        4              17
10  Television   Power Supply     4        5              22
11  Television   Power Supply     5        6              28
12  Television   Power Supply     7        7              35
13  Television   Power Supply     8        8              43

解决方案

Let's use reindex to create new TTF for missing number in sequence with np.arange:

df = pd.DataFrame({'Product':['Computer']*7 + ['Television']*7,'Module':['Display']*7 + ['Power Supply']*7,
                 'TTF':[1,2,3,4,6,7,8,1,2,3,4,5,7,8],'Repair':np.random.randint(1,8,14)})

df['Running Total'] = df['Repair'].cumsum()

print(df)

Input Dataframe:

          Module     Product  Repair  TTF  Running Total
0        Display    Computer       6    1              6
1        Display    Computer       2    2              8
2        Display    Computer       2    3             10
3        Display    Computer       4    4             14
4        Display    Computer       2    6             16
5        Display    Computer       3    7             19
6        Display    Computer       6    8             25
7   Power Supply  Television       3    1             28
8   Power Supply  Television       3    2             31
9   Power Supply  Television       5    3             36
10  Power Supply  Television       6    4             42
11  Power Supply  Television       4    5             46
12  Power Supply  Television       2    7             48
13  Power Supply  Television       2    8             50


df_out = df.set_index('TTF').groupby(['Product','Module'], group_keys=False).apply(lambda x: x.reindex(np.arange(1,9)))

df_out['repair'] = df_out['Repair'].fillna(0)

df_out = df_out.ffill().reset_index()

print(df_out)

Output:

    TTF        Module     Product  Repair  Running Total  repair
0     1       Display    Computer     6.0            6.0     6.0
1     2       Display    Computer     2.0            8.0     2.0
2     3       Display    Computer     2.0           10.0     2.0
3     4       Display    Computer     4.0           14.0     4.0
4     5       Display    Computer     4.0           14.0     0.0
5     6       Display    Computer     2.0           16.0     2.0
6     7       Display    Computer     3.0           19.0     3.0
7     8       Display    Computer     6.0           25.0     6.0
8     1  Power Supply  Television     3.0           28.0     3.0
9     2  Power Supply  Television     3.0           31.0     3.0
10    3  Power Supply  Television     5.0           36.0     5.0
11    4  Power Supply  Television     6.0           42.0     6.0
12    5  Power Supply  Television     4.0           46.0     4.0
13    6  Power Supply  Television     4.0           46.0     0.0
14    7  Power Supply  Television     2.0           48.0     2.0
15    8  Power Supply  Television     2.0           50.0     2.0

这篇关于我如何将缺失的行插入到这个数据集中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆