在dask Assign()或apply()中使用变量列名 [英] variable column name in dask assign() or apply()

查看：43 发布时间：2020/5/24 3:35:18 python pandas dask

本文介绍了在dask Assign()或apply()中使用变量列名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有可以在pandas中使用的代码，但是在将其转换为使用dask时遇到了麻烦.有一个局部解决方案此处，但不允许我使用变量作为要创建/分配给的列的名称.

I have code that works in pandas, but I'm having trouble converting it to use dask. There is a partial solution here, but it does not allow me to use a variable as the name of the column I am creating/assigning to.

这是有效的pandas代码:

percent_cols = ['num_unique_words', 'num_words_over_6']

def find_fraction(row, col):
    return row[col] / row['num_words']

for c in percent_cols:
    df[c] = df.apply(find_fraction, col=c, axis=1)

这是dask代码无法满足我的要求:

Here's the dask code that doesn't do what I want:

data = dd.from_pandas(df, npartitions=8)

for c in percent_cols:
    data = data.assign(c = data[c] / data.num_words)

这会将结果分配给名为c的新列，而不是修改data[c]的值(我想要的).如果我可以让列名是一个变量，那么创建一个新列就可以了.例如，如果这可行:

This assigns the result to a new column called c rather than modifying the value of data[c] (what I want). Creating a new column would be fine if I could have the column name be a variable. E.g., if this worked:

for c in percent_cols:
    name = c + "new"
    data = data.assign(name = data[c] / data.num_words)

出于明显的原因，python不允许=左边的表达式，而忽略name的先前值.

For obvious reasons, python doesn't allow an expression left of an = and ignores the previous value of name.

如何使用变量作为要分配给的列的名称?循环迭代的次数远远超过我愿意复制/粘贴的次数.

How can I use a variable for the name of the column I am assigning to? The loop iterates far more times than I'm willing to copy/paste.

Dask.dataframe解决方案

对于您的特定问题，我建议以下内容:

Dask.dataframe solution

For your particular question I recommend the following:

d = {col: df[col] / df['num_words'] for col in percent_cols}
df = df.assign(**d)

也考虑与熊猫一起做

.assign方法也可在Pandas中使用，并且可能比使用.apply更快.

Consider doing this with Pandas as well

The .assign method is available in Pandas as well and may be faster than using .apply.

这篇关于在dask Assign()或apply()中使用变量列名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在dask Assign()或apply()中使用变量列名 [英] variable column name in dask assign() or apply()

问题描述

推荐答案

Dask.dataframe解决方案

Dask.dataframe solution

也考虑与熊猫一起做

Consider doing this with Pandas as well

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在dask Assign()或apply()中使用变量列名 [英] variable column name in dask assign() or apply()

问题描述

推荐答案

Dask.dataframe解决方案

Dask.dataframe solution

也考虑与熊猫一起做

Consider doing this with Pandas as well

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭