在特定索引处插入带有值的行花费的时间太长 [英] inserting row with values at certain index taking too long

查看:65
本文介绍了在特定索引处插入带有值的行花费的时间太长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下表:

+-------------------------------------------------------+
| CarID  CarNumber   GPS     DateTime             Speed |
+-------------------------------------------------------+
| WFV303   303      104:58  04.02.2019 10:10:51    21   |
| WFV303   303      104:58  04.02.2019 10:10:54    23   |
| WFV303   303      104:58  04.02.2019 10:10:59    23   |
| WFV303   303      104:58  04.02.2019 10:11:01    24   |
| FBV404   404      105:59  04.02.2019 12:10:20    19   |
| FBV404   404      105:59  04.02.2019 12:10:25    19   |
+-------------------------------------------------------+

如果CarNumber中的i+1不等于i,我想用零值插入行,所以我看起来像这样:

I want to insert row with zero values if i+1 in CarNumber is not equal to i so I'd look like this:

+-------------------------------------------------------+
| CarID  CarNumber   GPS     DateTime             Speed |
+-------------------------------------------------------+
| WFV303   303      104:58  04.02.2019 10:10:51    21   |
| WFV303   303      104:58  04.02.2019 10:10:54    23   |
| WFV303   303      104:58  04.02.2019 10:10:59    23   |
| WFV303   303      104:58  04.02.2019 10:11:01    24   |
| 0        0        0       0                      0    |
| FBV404   404      105:59  04.02.2019 12:10:20    19   |
| FBV404   404      105:59  04.02.2019 12:10:25    19   |
+-------------------------------------------------------+

我尝试了以下操作:

for i in range(len(df['CarNumber'])):
    if df['CarNumber'].iloc[i]!=df['CarNumber'].iloc[i+1]:
        zero_row = pd.DataFrame({"CarNumber":0,"DateTime": 0}, index=[i+0.5])
        df = df.append(zero_row, ignore_index=False)
        df = df.sort_index().reset_index(drop=True)

我没有收到任何错误,但处理时间很长,而且从未完成(我的csv文件约为50 mb).

I get no errors whatsoever but it takes really long time to process and never finishes (my csv file is ~50 mb).

我该怎么办?有没有更有效的方法?

What do I do about it and is there more efficient way of doing this?

谢谢!

推荐答案

使用groupby.这至少应该比循环遍历所有行更有效率.

Use groupby. This should at least be more efficient than looping through all rows.


df = pd.DataFrame({'CarNumber': [303] * 4 + [404] * 2 + [405] * 5,
                   'othercol': range(11)})

def zero_row(cols, idx):
    return pd.DataFrame([[0] * len(cols)], columns=cols, index=[idx])

def add_zero_row(x):
    return x.append(zero_row(x.columns, x.index.max() + 0.5))

df = df.groupby('CarNumber').apply(add_zero_row)

# remove extra index from grouping
df = df.reset_index('CarNumber', drop=True)

# get rid of last zero row
df.iloc[:-1]

这篇关于在特定索引处插入带有值的行花费的时间太长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆