从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法 [英] Quickest way to make a get_dummies type dataframe from a column with a multiple of strings

查看：133 发布时间：2017/3/26 1:40:53 python pandas split dataframe

本文介绍了从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一列'col2'，它有一个字符串列表。我现在的代码太慢了，大约有2000个独特的字符串（下面的例子中的字母）和4000行。结束为2000列和4000行。

 在[268]中：df.head（）
输出[268 ]：
 col1 col2 
 0 6 A，B 
 1 15 C，G，A 
 2 25 B

有没有一种快速的方式来使这个获得虚拟的格式？每个字符串都有自己的列，每个字符串的列中有一个0或1，如果该行在col2中有该字符串。

 在[268]中：def get_list（df）：
d = [] 
 df.col2中的行：
 row_list = row.split（'，'）
 string_list中的字符串：
如果字符串不在d中：
 d.append（string）
返回d 
 
 df_list = get_list（df）
 
 def make_cols（df，lst）：
 for lst中的字符串：
 df [string] = 0 
返回df 
 
 df = make_cols（df，df_list ）
 
 
为范围内的idx（0，len（df ['col2']））：
 row_list = df ['col2']。iloc [idx] .split （'，'）
 row_list中的字符串：
 df [string] .iloc [idx] + = 1 
 
输出[113]：
 col1 col2 ABCG 
 0 6 A，B 1 1 0 0 
 1 15 C，G，A 1 0 1 1 
 2 25 B 0 1 0 0

这是我目前的代码，但是太慢了。

感谢您的帮助！

解决方案

您可以使用：

 >>> df ['col2']。str.get_dummies（sep ='，'）
 ABCG 
 0 1 1 0 0 
 1 1 0 1 1 
 2 0 1 0 0

加入数据框：

 >>>> pd.concat（[df，df ['col2']。str.get_dummies（sep ='，'）]，axis = 1）
 col1 col2 ABCG 
 0 6 A，B 1 1 0 0 
 1 15 C，G，A 1 0 1 1 
 2 25 B 0 1 0 0

I have a column, 'col2', that has a list of strings. The current code I have is too slow, there's about 2000 unique strings (the letters in the example below), and 4000 rows. Ending up as 2000 columns and 4000 rows.

In [268]: df.head()
Out[268]:
    col1    col2
0   6       A,B
1   15      C,G,A
2   25      B

Is there a fast way to make this in a get dummies format? Where each string has it's own column and in each string's column there is a 0 or 1 if it that row has that string in col2.

In [268]: def get_list(df):
d = []
for row in df.col2:
    row_list = row.split(',')
    for string in row_list:
        if string not in d:
            d.append(string)
return d

df_list = get_list(df)

def make_cols(df, lst):
    for string in lst:
        df[string] = 0
    return df

df = make_cols(df, df_list)


for idx in range(0, len(df['col2'])):
    row_list = df['col2'].iloc[idx].split(',')
    for string in row_list:
        df[string].iloc[idx]+= 1

Out[113]:
col1    col2    A   B   C   G
0   6   A,B     1   1   0   0
1   15  C,G,A   1   0   1   1
2   25  B       0   1   0   0

This is my current code for it but it's too slow.

Thanks you any help!

解决方案

You can use:

>>> df['col2'].str.get_dummies(sep=',')
   A  B  C  G
0  1  1  0  0
1  1  0  1  1
2  0  1  0  0

To join the Dataframes:

>>> pd.concat([df, df['col2'].str.get_dummies(sep=',')], axis=1)
   col1   col2  A  B  C  G
0     6    A,B  1  1  0  0
1    15  C,G,A  1  0  1  1
2    25      B  0  1  0  0

这篇关于从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法 [英] Quickest way to make a get_dummies type dataframe from a column with a multiple of strings

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从具有多个字符串的列中创建一个get_dummies类型数据框的最快方法 [英] Quickest way to make a get_dummies type dataframe from a column with a multiple of strings

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭