从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法 [英] Quickest way to make a get_dummies type dataframe from a column with a multiple of strings
本文介绍了从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在[268]中:df.head()
输出[268 ]:
col1 col2
0 6 A,B
1 15 C,G,A
2 25 B
有没有一种快速的方式来使这个获得虚拟的格式?每个字符串都有自己的列,每个字符串的列中有一个0或1,如果该行在col2中有该字符串。
在[268]中:def get_list(df):
d = []
df.col2中的行:
row_list = row.split(',')
string_list中的字符串:
如果字符串不在d中:
d.append(string)
返回d
df_list = get_list(df)
def make_cols(df,lst):
for lst中的字符串:
df [string] = 0
返回df
df = make_cols(df,df_list )
为范围内的idx(0,len(df ['col2'])):
row_list = df ['col2']。iloc [idx] .split (',')
row_list中的字符串:
df [string] .iloc [idx] + = 1
输出[113]:
col1 col2 ABCG
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
这是我目前的代码,但是太慢了。
感谢您的帮助!
解决方案
您可以使用:
>>> df ['col2']。str.get_dummies(sep =',')
ABCG
0 1 1 0 0
1 1 0 1 1
2 0 1 0 0
加入数据框:
>>>> pd.concat([df,df ['col2']。str.get_dummies(sep =',')],axis = 1)
col1 col2 ABCG
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
I have a column, 'col2', that has a list of strings. The current code I have is too slow, there's about 2000 unique strings (the letters in the example below), and 4000 rows. Ending up as 2000 columns and 4000 rows.
In [268]: df.head()
Out[268]:
col1 col2
0 6 A,B
1 15 C,G,A
2 25 B
Is there a fast way to make this in a get dummies format? Where each string has it's own column and in each string's column there is a 0 or 1 if it that row has that string in col2.
In [268]: def get_list(df):
d = []
for row in df.col2:
row_list = row.split(',')
for string in row_list:
if string not in d:
d.append(string)
return d
df_list = get_list(df)
def make_cols(df, lst):
for string in lst:
df[string] = 0
return df
df = make_cols(df, df_list)
for idx in range(0, len(df['col2'])):
row_list = df['col2'].iloc[idx].split(',')
for string in row_list:
df[string].iloc[idx]+= 1
Out[113]:
col1 col2 A B C G
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
This is my current code for it but it's too slow.
Thanks you any help!
解决方案
You can use:
>>> df['col2'].str.get_dummies(sep=',')
A B C G
0 1 1 0 0
1 1 0 1 1
2 0 1 0 0
To join the Dataframes:
>>> pd.concat([df, df['col2'].str.get_dummies(sep=',')], axis=1)
col1 col2 A B C G
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
这篇关于从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文