panda 根据其他列的值同时添加几个新列? [英] panda add several new columns based on values from other columns at the same time?
问题描述
如何根据其他列的值同时添加多个新列?我只找到了一次添加一行的示例.
How to add several new columns based on values from other columns at the same time? I only found examples to add a row one at a time.
我可以添加 3 个新列,但这似乎效率不高,因为它必须遍历所有行 3 次.有没有办法遍历DF一次?
I am able to add 3 new columns but this does not seem efficient since it has to go through all the rows 3 times. Is there a way to traverse the DF once?
import pandas as pd
from decimal import Decimal
d = [
{'A': 2, 'B': Decimal('628.00')},
{'A': 1, 'B': Decimal('383.00')},
{'A': 3, 'B': Decimal('651.00')},
{'A': 2, 'B': Decimal('575.00')},
{'A': 4, 'B': Decimal('1114.00')},
]
df = pd.DataFrame(d)
In : df
Out:
A B
0 2 628.00
1 1 383.00
2 3 651.00
3 2 575.00
4 4 1114.00
# How to do those in one operation to avoid traversing the DF 3 times
df['C'] = df.apply(lambda row: row['B']-1000, axis=1)
df['D'] = df.apply(lambda row: row['B']*row['B'], axis=1)
df['E'] = df.apply(lambda row: row['B']/2, axis=1)
In : df
Out:
A B C D E
0 2 628.00 -372.00 394384.0000 314.00
1 1 383.00 -617.00 146689.0000 191.50
2 3 651.00 -349.00 423801.0000 325.50
3 2 575.00 -425.00 330625.0000 287.50
4 4 1114.00 114.00 1240996.0000 557.00
推荐答案
我不会使用 lambda 函数.简单的矢量化实现既快速又易于阅读.
I wouldn't use a lambda function. Simple vectorized implementation is both faster and easier to read.
df['C'] = df['B'] - 1000
df['D'] = df['B'] ** 2
df['E'] = df['B'] / 2
>>> df
A B C D E
0 2 628.00 -372.00 394384.0000 314.00
1 1 383.00 -617.00 146689.0000 191.50
2 3 651.00 -349.00 423801.0000 325.50
3 2 575.00 -425.00 330625.0000 287.50
4 4 1114.00 114.00 1240996.0000 557.00
让我们在一个有一百万行的数据帧上计时:
Let's time it on a dataframe with one million rows:
df = pd.concat([df for _ in range(200000)], ignore_index=True)
>>> df.shape
(1000000, 2)
>>> %%timeit -n 3
df['C'] = df.apply(lambda row: row['B'] - 1000, axis=1)
df['D'] = df.apply(lambda row: row['B'] * row['B'], axis=1)
df['E'] = df.apply(lambda row: row['B'] / 2, axis=1)
3 loops, best of 3: 1min 20s per loop
>>> %%timeit -n 3
df['C'] = df['B'] - 1000
df['D'] = df['B'] ** 2
df['E'] = df['B'] / 2
3 loops, best of 3: 49.7 s per loop
如果您取消 Decimal 类型而使用浮点数,速度会明显更快:
The speed is significantly faster if you did away with the Decimal type and used a float instead:
d = [
{'A': 2, 'B': 628.00},
{'A': 1, 'B': 383.00},
{'A': 3, 'B': 651.00},
{'A': 2, 'B': 575.00},
{'A': 4, 'B': 1114.00}]
df = pd.DataFrame(d)
df = pd.concat([df for _ in range(200000)], ignore_index=True)
>>> %%timeit -n 3
df['C'] = df['B'] - 1000
df['D'] = df['B'] ** 2
df['E'] = df['B'] / 2
3 loops, best of 3: 33.1 ms per loop
>>> df.shape
(1000000, 5)
这篇关于panda 根据其他列的值同时添加几个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!