执行一个添加列并根据 pandas 中其他列填充它们的函数 [英] Executing a function that adds columns and populates them dependig on other columns in Pandas
问题描述
我有一个包含文本和结果的数据框
I have got a dataframe that contains a text and result
Text Result
0 some text... True
1 another one... False
我有一个函数可以从文本中提取特征-返回dict,其中包含约1000个键,这些键是单词和T/F值,具体取决于单词是否在文本中.
And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.
words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
result = dict()
for w in words:
result[w] = (w in text)
return result
我期望的结果是
Text some text another one other words Result
0 some text... True True False False False False True
1 another one... False False True True False False False
但是我不知道如何将其应用于数据框? 到目前为止,我所做的是创建带有默认False值的列,但是我不知道如何用True值填充它.
But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.
for feature in words:
df[feature] = False
我想在大熊猫中有更好的方法吗?
I guess that there is better way to do it in pandas?
推荐答案
将pd.Series.str.get_dummies
与pd.DataFrame.reindex
exp = (
df.Text.str.get_dummies(' ')
.reindex(columns=words, fill_value=0)
.astype(bool)
)
df.drop('Result', 1).join(exp).join(df.Result)
Text some text another one other words Result
0 some text True True False False False False True
1 another one False False True True False False False
说明
get_dummies
为找到的每个单词提供伪列,非常简单.但是,我使用reindex来表示我们关心的所有单词. fill_value
和astype(bool)
用于匹配OP输出.我使用drop
和join(df.Result)
作为将Result
移到数据帧末尾的简单方法.
get_dummies
gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value
and astype(bool)
are there to match OPs output. I use drop
and join(df.Result)
as a pithy way to get Result
to the end of the dataframe.
这篇关于执行一个添加列并根据 pandas 中其他列填充它们的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!