从词典字典创建 pandas 数据框 [英] create pandas dataframe from dictionary of dictionaries

查看:120
本文介绍了从词典字典创建 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典形式的字典:

I have a dictionary of dictionaries of the form:

{'user':{movie:rating} }

例如

{Jill': {'Avenger: Age of Ultron': 7.0,
                            'Django Unchained': 6.5,
                            'Gone Girl': 9.0,
                            'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
                                'Django Unchained': 9.0,
                                'Zoolander': 2.0}}

我想将此字典的dict转换为熊猫数据框,其中第1列为用户名,其他列为电影等级,即

I want to convert this dict of dicts into a pandas dataframe with column 1 the user name and the other columns the movie ratings i.e.

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \

但是,某些用户未对电影进行评分,因此这些电影不包含在该用户key()的values()中.在这些情况下,仅用NaN填充条目将是很好的.

However, some users did not rate the movies and so these movies are not included in the values() for that user key(). It would be nice in these cases to just fill the entry with NaN.

到目前为止,我遍历键,填充列表,然后使用此列表创建数据框:

As of now, I iterate over the keys, fill a list, and then use this list to create a dataframe:

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl']
                    ,movie_user_preferences[key]['Horrible Bosses 2']
                    ,movie_user_preferences[key]['Django Unchained']
                    ,movie_user_preferences[key]['Zoolander']
                    ,movie_user_preferences[key]['Avenger: Age of Ultron']
                    ,movie_user_preferences[key]['Kill the Messenger']))
    # if no entry, skip
    except:
        pass 
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])

但是,这仅给了我一个评价集合中所有电影的用户数据框.

But this only gives me a dataframe of users who rated all the movies in the set.

我的目标是通过遍历影片标签(而不是上面显示的强行方法)将其追加到数据列表中,其次,创建一个包含所有用户的数据框,并在不包含元素的元素中放置空值有电影分级.

My goal is to append to the data list by iterating over the movie labels (rather than the brute force approach shown above) and, secondly, create a dataframe that includes all users and that places null values in the elements that do not have movie ratings.

推荐答案

您可以将dict的dict传递给DataFrame构造函数:

You can pass the dict of dict to the DataFrame constructor:

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}

In [12]: pd.DataFrame(d)
Out[12]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

或使用from_dict方法:

In [13]: pd.DataFrame.from_dict(d)
Out[13]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
      Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander
Jill               6.5          9                   8                     7.0        NaN
Toby               9.0        NaN                 NaN                     8.5          2

这篇关于从词典字典创建 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆