从 pandas 返回多列apply() [英] Return multiple columns from pandas apply()

查看：98 发布时间：2020/5/23 21:37:23 python pandas dataframe apply

本文介绍了从 pandas 返回多列apply()的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个熊猫DataFrame，df_test.它包含一列大小"，以字节为单位表示大小.我已经使用以下代码计算了KB，MB和GB:

I have a pandas DataFrame, df_test. It contains a column 'size' which represents size in bytes. I've calculated KB, MB, and GB using the following code:

df_test = pd.DataFrame([
    {'dir': '/Users/uname1', 'size': 994933},
    {'dir': '/Users/uname2', 'size': 109338711},
])

df_test['size_kb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0, grouping=True) + ' KB')
df_test['size_mb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0 ** 2, grouping=True) + ' MB')
df_test['size_gb'] = df_test['size'].astype(int).apply(lambda x: locale.format("%.1f", x / 1024.0 ** 3, grouping=True) + ' GB')

df_test


             dir       size       size_kb   size_mb size_gb
0  /Users/uname1     994933      971.6 KB    0.9 MB  0.0 GB
1  /Users/uname2  109338711  106,776.1 KB  104.3 MB  0.1 GB

[2 rows x 5 columns]

我已经运行了12万多行，并且每列花费的时间约为2.97秒*根据％timeit，这大约等于9秒.

I've run this over 120,000 rows and time it takes about 2.97 seconds per column * 3 = ~9 seconds according to %timeit.

无论如何，我可以使它更快吗?例如，我是否可以代替一次套用并运行3次而不是一次返回一列，而是可以一次返回所有三列以插入到原始数据框中吗?

Is there anyway I can make this faster? For example, can I instead of returning one column at a time from apply and running it 3 times, can I return all three columns in one pass to insert back into the original dataframe?

我发现的其他所有问题都想采用多个值并返回一个值.我想采用一个值并返回多列.

The other questions I've found all want to take multiple values and return a single value. I want to take a single value and return multiple columns.

推荐答案

这是一个老问题，但是为了完整起见，您可以从包含新数据的应用函数中返回一个Series，从而避免了需要进行三次迭代.将axis=1传递给apply函数会将函数sizes应用于数据帧的每一行，并返回要添加到新数据帧的序列.这个系列s包含新值以及原始数据.

This is an old question, but for completeness, you can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1 to the apply function applies the function sizes to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.

def sizes(s):
    s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'
    s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'
    s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'
    return s

df_test = df_test.append(rows_list)
df_test = df_test.apply(sizes, axis=1)

这篇关于从 pandas 返回多列apply()的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 pandas 返回多列apply() [英] Return multiple columns from pandas apply()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从 pandas 返回多列apply() [英] Return multiple columns from pandas apply()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭