从列表列表创建pandas数据框,但是有不同的分隔符 [英] Create pandas dataframe from list of lists, but there are different seperators
本文介绍了从列表列表创建pandas数据框,但是有不同的分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个列表列表:
[['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
我想以这些列作为熊猫数据框的结尾.
I want to end up with a pandas dataframe with these columns.
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
对于'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance'
列,数据将为1或0.
For the 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance'
columns, the data will be 1 or 0.
我尝试过:
for row in movies_list:
for element in row:
if '|' in element:
element = element.split('|')
但是原始列表没有任何反应.在这里完全卡住了.
However nothing happens to the original list.. Completely stumped here.
推荐答案
Use DataFrame
constructor with str.get_dummies
:
L = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
df = pd.DataFrame(L, columns=['MovieID','Name','Data'])
df1 = df['Data'].str.get_dummies()
print (df1)
Adventure Animation Children's Comedy Fantasy Romance
0 0 1 1 1 0 0
1 1 0 1 0 1 0
2 0 0 0 1 0 1
对于列Name
和Year
,需要 rstrip
删除尾随的)
,Year
也会转换为int
s.
For columns Name
and Year
need split
and rstrip
for remove trailing )
, also Year
is converted to int
s.
df[['Name','Year']] = df['Name'].str.split('\s\(', expand=True)
df['Year'] = df['Year'].str.rstrip(')').astype(int)
最后删除列Data
并通过df1添加到原始列"nofollow noreferrer"> join
:
Last remove column Data
and add df1
to original by join
:
df = df.drop('Data', axis=1).join(df1)
print (df)
MovieID Name Year Adventure Animation Children's Comedy \
0 1 Toy Story 1995 0 1 1 1
1 2 Jumanji 1995 1 0 1 0
2 3 Grumpier Old Men 1995 0 0 0 1
Fantasy Romance
0 0 0
1 1 0
2 0 1
这篇关于从列表列表创建pandas数据框,但是有不同的分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文