python panda:在列中查找特定字符串并填充与字符串匹配的列 [英] python panda: find a specific string in a column and fill the column matching the string
问题描述
我有一个包含多列的数据框.其中之一充满了由 | 分隔的电影流派",我将此列拆分为其他几列,以获得 X 列,每个列都填充了拆分值.但是,我需要为每个流派"设置 1 列,填充 1 或 0,具体取决于列的标题是在名义流派列中还是在拆分列之一中找到.我的数据框设置如下:
I have a dataframe with several columns. One of them is filled with "genres" of movie separated by |, I've splitted this column in several others to get X columns each filled with the splitted value. However what I'd need is to have 1 column for each "genre" that gets filled by 1 or 0 depending on if the header of the column is found in either the nominal genres columns or in one of the splitted column. I get my dataframe set up like this:
df = pd.DataFrame({'A': ['drama|Action', 'Drama', 'Action'], 'A_split1': ['Drama', 'Drama', 'Action'],'A_split2': ['Action', 'None', 'None'],'Drama': [0, 0, 0], 'Action': [0, 0, 0], 'Western': [0, 0, 0]},
index = ['a1', 'a2', 'a3'])
df
但是我没有找到如何检查标题名称是否在字符串中以添加 1 或 0.
But I didn't find how to do the check if name of header is within a string to add the 1 or 0.
推荐答案
我认为你需要 pop
用于带有 str.get_dummies
和 join
到原始:
I think you need pop
for extract column with str.get_dummies
and join
to original:
df = pd.DataFrame({'A': ['Drama|Action', 'Drama', 'Action'], 'B':range(3)},
index = ['a1', 'a2', 'a3'])
print (df)
A B
a1 Drama|Action 0
a2 Drama 1
a3 Action 2
df = df.join(df.pop('A').str.get_dummies())
print (df)
B Action Drama
a1 0 1 1
a2 1 0 1
a3 2 1 0
如果想要原始列:
df = df.join(df['A'].str.get_dummies())
print (df)
A B Action Drama
a1 Drama|Action 0 1 1
a2 Drama 1 0 1
a3 Action 2 1 0
这篇关于python panda:在列中查找特定字符串并填充与字符串匹配的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!