panda 根据其他列的值同时添加几个新列? [英] panda add several new columns based on values from other columns at the same time?

查看:44
本文介绍了panda 根据其他列的值同时添加几个新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据其他列的值同时添加多个新列?我只找到了一次添加一行的示例.

How to add several new columns based on values from other columns at the same time? I only found examples to add a row one at a time.

我可以添加 3 个新列,但这似乎效率不高,因为它必须遍历所有行 3 次.有没有办法遍历DF一次?

I am able to add 3 new columns but this does not seem efficient since it has to go through all the rows 3 times. Is there a way to traverse the DF once?

import pandas as pd
from decimal import Decimal
d = [
    {'A': 2, 'B': Decimal('628.00')},
    {'A': 1, 'B': Decimal('383.00')},
    {'A': 3, 'B': Decimal('651.00')},
    {'A': 2, 'B': Decimal('575.00')},
    {'A': 4, 'B': Decimal('1114.00')},
]

df = pd.DataFrame(d)

In : df
Out:
   A        B
0  2   628.00
1  1   383.00
2  3   651.00
3  2   575.00
4  4  1114.00

# How to do those in one operation to avoid traversing the DF 3 times
df['C'] = df.apply(lambda row: row['B']-1000, axis=1)
df['D'] = df.apply(lambda row: row['B']*row['B'], axis=1)
df['E'] = df.apply(lambda row: row['B']/2, axis=1)

In : df
Out:
   A        B        C             D       E
0  2   628.00  -372.00   394384.0000  314.00
1  1   383.00  -617.00   146689.0000  191.50
2  3   651.00  -349.00   423801.0000  325.50
3  2   575.00  -425.00   330625.0000  287.50
4  4  1114.00   114.00  1240996.0000  557.00

推荐答案

我不会使用 lambda 函数.简单的矢量化实现既快速又易于阅读.

I wouldn't use a lambda function. Simple vectorized implementation is both faster and easier to read.

df['C'] = df['B'] - 1000
df['D'] = df['B'] ** 2
df['E'] = df['B'] / 2

>>> df
   A        B        C             D       E
0  2   628.00  -372.00   394384.0000  314.00
1  1   383.00  -617.00   146689.0000  191.50
2  3   651.00  -349.00   423801.0000  325.50
3  2   575.00  -425.00   330625.0000  287.50
4  4  1114.00   114.00  1240996.0000  557.00

让我们在一个有一百万行的数据帧上计时:

Let's time it on a dataframe with one million rows:

df = pd.concat([df for _ in range(200000)], ignore_index=True)
>>> df.shape
(1000000, 2)

>>> %%timeit -n 3
    df['C'] = df.apply(lambda row: row['B'] - 1000, axis=1)
    df['D'] = df.apply(lambda row: row['B'] * row['B'], axis=1)
    df['E'] = df.apply(lambda row: row['B'] / 2, axis=1)
3 loops, best of 3: 1min 20s per loop

>>> %%timeit -n 3
    df['C'] = df['B'] - 1000
    df['D'] = df['B'] ** 2
    df['E'] = df['B'] / 2
3 loops, best of 3: 49.7 s per loop

如果您取消 Decimal 类型而使用浮点数,速度会明显更快:

The speed is significantly faster if you did away with the Decimal type and used a float instead:

d = [
    {'A': 2, 'B': 628.00},
    {'A': 1, 'B': 383.00},
    {'A': 3, 'B': 651.00},
    {'A': 2, 'B': 575.00},
    {'A': 4, 'B': 1114.00}]

df = pd.DataFrame(d)
df = pd.concat([df for _ in range(200000)], ignore_index=True)

>>> %%timeit -n 3
    df['C'] = df['B'] - 1000
    df['D'] = df['B'] ** 2
    df['E'] = df['B'] / 2
3 loops, best of 3: 33.1 ms per loop

>>> df.shape
(1000000, 5)

这篇关于panda 根据其他列的值同时添加几个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆