字典到pandas DataFrame的列表列表 [英] List of lists to dictionary to pandas DataFrame
问题描述
我正在尝试拟合以下数据:
I am trying to fit this data:
[['Manufacturer: Hyundai',
'Model: Tucson',
'Mileage: 258000 km',
'Registered: 07/2019'],
['Manufacturer: Mazda',
'Model: 6',
'Year: 2014',
'Registered: 07/2019']]
到熊猫DataFrame.
to pandas DataFrame.
并非所有标签都出现在每个记录中,例如,某些记录具有里程",而另一些则没有. 我一共有26个功能,而几乎所有功能都很少.
Not all labels are present in each record, for example some records have 'Mileage' and others don't and vice versa. I have a total of 26 features and very few items have all of them.
我想构造一个熊猫DataFrame,它将在列中保存要素,如果要素不存在,则内容应为"NaN".
I would like to construct pandas DataFrame that will hold features in columns and if feature doesn't exists than content should be 'NaN'.
我有
colnames=['Manufacturer', 'Model', 'Mileage', 'Registered', 'Year'...(all 26 features here)]
df = pd.read_csv("./data/output.csv", sep=",", names=colnames, header=None)
很少有先决条件列能提供预期的输出,但是在涉及可选功能时,缺少数据会导致之后的功能在错误的列下出现. 仅当所有功能都存在时,记录才能正确映射.
Few first prerequisite columns are giving expected output but when it comes to optional features than missing data causing features after that to turn out under wrong columns. Records are mapped correctly only if all features are present.
我忘了提及一些缺少价值的功能,这些功能也没有:"但出现在列表中. 因此,在这2种情况下:
I forgot to mention that some features that are missing value also don't have ":" but are present in list. So in this 2 cases:
- 里程",(缺少值,但也缺少:")
- 从唱片高手那里缺少里程"
两种情况下的分配均应为"NaN".
assignment for both cases should be 'NaN'.
推荐答案
将嵌套列表推导用于词典列表,如果缺少相同的键,则传递给DataFrame
构造器:NaN
:
Use nested list comprehension for list of dictionaries and pass to DataFrame
contructor, if same key is missing is added NaN
:
L = [['Manufacturer: Hyundai',
'Model: Tucson',
'Mileage: 258000 km',
'Registered: 07/2019'],
['Manufacturer: Mazda',
'Model: 6',
'Year: 2014',
'Registered: 07/2019']]
df = pd.DataFrame([dict(y.split(':') for y in x) for x in L])
print (df)
Manufacturer Mileage Model Registered Year
0 Hyundai 258000 km Tucson 07/2019 NaN
1 Mazda NaN 6 07/2019 2014
您可以使用.split(maxsplit=1)
来按第一个空格进行分割:
You can use .split(maxsplit=1)
for split by first whitespace:
L = [['Manufacturer Hyundai',
'Model Tucson',
'Mileage 258000 km',
'Registered 07/2019'],
['Manufacturer Mazda',
'Model 6',
'Year 2014',
'Registered 07/2019']]
df = pd.DataFrame([dict(y.split(maxsplit=1) for y in x) for x in L])
print (df)
Manufacturer Mileage Model Registered Year
0 Hyundai 258000 km Tucson 07/2019 NaN
1 Mazda NaN 6 07/2019 2014
L = [['Manufacturer Hyundai',
'Model Tucson',
'Mileage 258000 km',
'Registered 07/2019'],
['Manufacturer Mazda',
'Model 6',
'Year 2014',
'Registered 07/2019',
'Additional equipment aaa']]
words2 = ['Additional equipment']
L1 = []
for x in L:
di = {}
for y in x:
for word in words2:
if set(word.split(maxsplit=2)[:2]) < set(y.split()):
i, j, k = y.split(maxsplit=2)
di['_'.join([i, j])] = k
else:
i, j = y.split(maxsplit=1)
di[i] = j
L1.append(di)
df = pd.DataFrame(L1)
print (df)
Additional_equipment Manufacturer Mileage Model Registered Year
0 NaN Hyundai 258000 km Tucson 07/2019 NaN
1 aaa Mazda NaN 6 07/2019 2014
这篇关于字典到pandas DataFrame的列表列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!