将 pandas 函数应用于列以创建多个新列? [英] Apply pandas function to column to create multiple new columns?

查看:33
本文介绍了将 pandas 函数应用于列以创建多个新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在熊猫中做到这一点:

我在单个文本列上有一个函数 extract_text_features,返回多个输出列.具体来说,该函数返回 6 个值.

该函数有效,但是似乎没有任何正确的返回类型(pandas DataFrame/numpy 数组/Python 列表)使得输出可以正确分配 df.ix[: ,10:16]= df.textcol.map(extract_text_features)

所以我想我需要回到使用 df.iterrows() 进行迭代,按照 这个?

更新:使用 df.iterrows() 迭代至少要慢 20 倍,所以我放弃并将函数拆分为六个不同的 .map(lambda ...) 调用.>

更新 2:这个问题在 v0.11.0.因此,很多问题和答案都不太相关.

解决方案

根据 user1827356 的回答,您可以使用 df.merge 一次性完成分配:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})),left_index=True, right_index=True)textcol 功能1 功能20 0.772692 1.772692 -0.2273081 0.857210 1.857210 -0.1427902 0.065639 1.065639 -0.9343613 0.819160 1.819160 -0.1808404 0.088212 1.088212 -0.911788

请注意巨大的内存消耗和低速:https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/

How to do this in pandas:

I have a function extract_text_features on a single text column, returning multiple output columns. Specifically, the function returns 6 values.

The function works, however there doesn't seem to be any proper return type (pandas DataFrame/ numpy array/ Python list) such that the output can get correctly assigned df.ix[: ,10:16] = df.textcol.map(extract_text_features)

So I think I need to drop back to iterating with df.iterrows(), as per this?

UPDATE: Iterating with df.iterrows() is at least 20x slower, so I surrendered and split out the function into six distinct .map(lambda ...) calls.

UPDATE 2: this question was asked back around v0.11.0. Hence much of the question and answers are not too relevant.

解决方案

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
    left_index=True, right_index=True)

    textcol  feature1  feature2
0  0.772692  1.772692 -0.227308
1  0.857210  1.857210 -0.142790
2  0.065639  1.065639 -0.934361
3  0.819160  1.819160 -0.180840
4  0.088212  1.088212 -0.911788

EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

这篇关于将 pandas 函数应用于列以创建多个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆