从 pandas apply() 返回多列 [英] Return multiple columns from pandas apply()
问题描述
我有一个 Pandas DataFrame,df_test
.它包含一列大小",以字节为单位表示大小.我已经使用以下代码计算了 KB、MB 和 GB:
I have a pandas DataFrame, df_test
. It contains a column 'size' which represents size in bytes. I've calculated KB, MB, and GB using the following code:
df_test = pd.DataFrame([
{'dir': '/Users/uname1', 'size': 994933},
{'dir': '/Users/uname2', 'size': 109338711},
])
df_test['size_kb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0, grouping=True) + ' KB')
df_test['size_mb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0 ** 2, grouping=True) + ' MB')
df_test['size_gb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0 ** 3, grouping=True) + ' GB')
df_test
dir size size_kb size_mb size_gb
0 /Users/uname1 994933 971.6 KB 0.9 MB 0.0 GB
1 /Users/uname2 109338711 106,776.1 KB 104.3 MB 0.1 GB
[2 rows x 5 columns]
我已经运行了超过 120,000 行,根据 %timeit,每列大约需要 2.97 秒 * 3 = ~9 秒.
I've run this over 120,000 rows and time it takes about 2.97 seconds per column * 3 = ~9 seconds according to %timeit.
无论如何我可以让它更快吗?例如,我可以一次返回一列,而不是从 apply 并运行 3 次,而是一次返回所有三列以插入回原始数据帧吗?
Is there anyway I can make this faster? For example, can I instead of returning one column at a time from apply and running it 3 times, can I return all three columns in one pass to insert back into the original dataframe?
我发现的其他问题都希望取多个值并返回一个值.我想取一个值并返回多个列.
The other questions I've found all want to take multiple values and return a single value. I want to take a single value and return multiple columns.
推荐答案
您可以从包含新数据的应用函数中返回一个系列,从而避免需要迭代 3 次.将 axis=1
传递给 apply 函数,将函数 sizes
应用于数据帧的每一行,返回要添加到新数据帧的系列.该系列 s 包含新值以及原始数据.
You can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1
to the apply function applies the function sizes
to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.
def sizes(s):
s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'
s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'
s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'
return s
df_test = df_test.append(rows_list)
df_test = df_test.apply(sizes, axis=1)
这篇关于从 pandas apply() 返回多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!