从列表列表创建pandas数据框,但是有不同的分隔符 [英] Create pandas dataframe from list of lists, but there are different seperators

查看:93
本文介绍了从列表列表创建pandas数据框,但是有不同的分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表列表:

     [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
     ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
     ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]

我想以这些列作为熊猫数据框的结尾.

I want to end up with a pandas dataframe with these columns.

cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']

对于'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance'列,数据将为1或0.

For the 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance' columns, the data will be 1 or 0.

我尝试过:

for row in movies_list:
    for element in row:
        if '|' in element:
            element = element.split('|')

但是原始列表没有任何反应.在这里完全卡住了.

However nothing happens to the original list.. Completely stumped here.

推荐答案

DataFrame构造函数与

Use DataFrame constructor with str.get_dummies:

L = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
     ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
     ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
df = pd.DataFrame(L, columns=['MovieID','Name','Data'])


df1 = df['Data'].str.get_dummies()
print (df1)
   Adventure  Animation  Children's  Comedy  Fantasy  Romance
0          0          1           1       1        0        0
1          1          0           1       0        1        0
2          0          0           0       1        0        1

对于列NameYear,需要 rstrip 删除尾随的)Year也会转换为int s.

For columns Name and Year need split and rstrip for remove trailing ), also Year is converted to ints.

df[['Name','Year']] = df['Name'].str.split('\s\(', expand=True)
df['Year'] = df['Year'].str.rstrip(')').astype(int)

最后删除列Data并通过df1添加到原始列"nofollow noreferrer"> join :

Last remove column Data and add df1 to original by join:

df = df.drop('Data', axis=1).join(df1)
print (df)
  MovieID              Name  Year  Adventure  Animation  Children's  Comedy  \
0       1         Toy Story  1995          0          1           1       1   
1       2           Jumanji  1995          1          0           1       0   
2       3  Grumpier Old Men  1995          0          0           0       1   

   Fantasy  Romance  
0        0        0  
1        1        0  
2        0        1  

这篇关于从列表列表创建pandas数据框,但是有不同的分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆