执行一个添加列并根据 pandas 中其他列填充它们的函数 [英] Executing a function that adds columns and populates them dependig on other columns in Pandas

查看:50
本文介绍了执行一个添加列并根据 pandas 中其他列填充它们的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含文本和结果的数据框

I have got a dataframe that contains a text and result

             Text    Result
0  some text...      True
1  another one...    False

我有一个函数可以从文本中提取特征-返回dict,其中包含约1000个键,这些键是单词和T/F值,具体取决于单词是否在文本中.

And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.

words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
      result = dict()
      for w in words:
             result[w] = (w in text)
      return result

我期望的结果是

             Text    some   text  another one    other  words  Result
0  some text...      True   True  False   False  False  False  True
1  another one...    False  False True    True   False  False  False

但是我不知道如何将其应用于数据框? 到目前为止,我所做的是创建带有默认False值的列,但是我不知道如何用True值填充它.

But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.

for feature in words:
    df[feature] = False

我想在大熊猫中有更好的方法吗?

I guess that there is better way to do it in pandas?

推荐答案

pd.Series.str.get_dummiespd.DataFrame.reindex

exp = (
    df.Text.str.get_dummies(' ')
      .reindex(columns=words, fill_value=0)
      .astype(bool)
)

df.drop('Result', 1).join(exp).join(df.Result)

          Text   some   text  another    one  other  words  Result
0    some text   True   True    False  False  False  False    True
1  another one  False  False     True   True  False  False   False


说明

get_dummies为找到的每个单词提供伪列,非常简单.但是,我使用reindex来表示我们关心的所有单词. fill_valueastype(bool)用于匹配OP输出.我使用dropjoin(df.Result)作为将Result移到数据帧末尾的简单方法.

get_dummies gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value and astype(bool) are there to match OPs output. I use drop and join(df.Result) as a pithy way to get Result to the end of the dataframe.

这篇关于执行一个添加列并根据 pandas 中其他列填充它们的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆