pandas :将单列中的列表转换为多列 [英] Pandas: Convert lists within a single column to multiple columns
问题描述
我有一个数据框,其中包含具有多个属性的列,这些列的多个属性之间用逗号分隔:
I have a dataframe that includes columns with multiple attributes separated by commas:
df = pd.DataFrame({'id':[ 1,2,3],'labels':[ a,b,c, c,a, d,a,b]})
id labels
0 1 a,b,c
1 2 c,a
2 3 d,a,b
(我知道这不是理想的情况,但数据来自外部来源。)我想将多属性列变成多个列,每个标签对应一个列,以便将它们视为分类变量。所需的输出:
(I know this isn't an ideal situation, but the data originates from an external source.) I want to turn the multi-attribute columns into multiple columns, one for each label, so that I can treat them as categorical variables. Desired output:
id a b c d
0 1 True True True False
1 2 True False True False
2 3 True True False True
我可以得到所有可能属性的集合( [a,b,c,d]
)相当容易,但是无法找到一种确定给定行是否具有特定属性的方法,而无需为每个属性逐行迭代。有更好的方法吗?
I can get the set of all possible attributes ([a,b,c,d]
) fairly easily, but cannot figure out a way to determine whether a given row has a particular attribute without row-by-row iteration for each attribute. Is there a better way to do this?
推荐答案
您可以使用 get_dummies
,投射为 1 由
0
转换为布尔值
/pandas-docs/stable/generation/pandas.DataFrame.astype.html> astype
和最后一个 concat
列 id
:
You can use get_dummies
, cast 1
and 0
to boolean
by astype
and last concat
column id
:
print df['labels'].str.get_dummies(sep=',').astype(bool)
a b c d
0 True True True False
1 True False True False
2 True True False True
print pd.concat([df.id, df['labels'].str.get_dummies(sep=',').astype(bool)], axis=1)
id a b c d
0 1 True True True False
1 2 True False True False
2 3 True True False True
这篇关于 pandas :将单列中的列表转换为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!