列表的嵌套字典到 pandas DataFrame [英] Nested dict of lists to pandas DataFrame

查看：114 发布时间：2020/10/21 23:47:57 python pandas dictionary countvectorizer

本文介绍了列表的嵌套字典到 pandas DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个杂乱的嵌套字典，我试图将其转换为熊猫数据框。数据存储在包含在更广泛的字典中的列表的字典中，其中每个键/值细分如下：
{userID_key：{postID_key：[哈希标签列表]}}

I have a rather messy nested dictionary that I am trying to convert to a pandas data frame. The data is stored in a dictionary of lists contained in a broader dictionary, where each key/value breakdown follows: {userID_key: {postID_key: [list of hash tags]}}

以下是数据的具体示例：

Here's a more specific example of what the data looks like:

   {'user_1': {'postID_1':  ['#fitfam',
                             '#gym',
                             '#bro'],
               'postID_2':  ['#swol',
                             '#anotherhashtag']},
    'user_2': {'postID_78': ['#ripped',
                             '#bro',
                             '#morehashtags'],
               'postID_1':  ['#buff',
                             '#othertags']},
    'user_3': ...and so on }

我想创建一个数据框，为我提供每个（userID，postID）对的每个主题标签的频率计数，如下所示：

I want to create a data frame that gives me the frequency counts of each hashtag for each (userID,postID) pair like below:

+------------+------------+--------+-----+-----+------+-----+
| UserID_key | PostID_key | fitfam | gym | bro | swol | ... |
+------------+------------+--------+-----+-----+------+-----+
| user_1     | postID_1   | 1      | 1   | 1   | 0    | ... |
| user_1     | postID_2   | 0      | 0   | 0   | 1    | ... |
| user_2     | postID_78  | 0      | 0   | 1   | 0    | ... |
| user_2     | postID_1   | 0      | 0   | 0   | 0    | ... |
| user_3     | ...        | ...    | ... | ... | ...  | ... |
+------------+------------+--------+-----+-----+------+-----+

我有scikit-learn的 CountVectorizer 是一个想法，但它无法处理嵌套字典。希望对将其转换成所需的形式有所帮助。

I had scikit-learn's CountVectorizer as an idea but it's not going to be able to process a nested dictionary. Would appreciate any help getting it into that desired form.

推荐答案

基于我对另一个问题的回答，您可以使用 pd.concat 构建并连接子框架，然后使用堆栈和 get_dummies ：

Building on my answer to another question, you can build and concatenate sub-frames using pd.concat, then use stack and get_dummies:

(pd.concat({k: pd.DataFrame.from_dict(v, orient='index') for k, v in dct.items()})
   .stack()
   .str.get_dummies()
   .sum(level=[0, 1]))

                  #anotherhashtag  #bro  #buff  #fitfam  #gym  #morehashtags  #othertags  #ripped  #swol
user_1 postID_1                 0     1      0        1     1              0           0        0      0
       postID_2                 1     0      0        0     0              0           0        0      1
user_2 postID_78                0     1      0        0     0              1           0        1      0
       postID_1                 0     0      1        0     0              0           1        0      0

这篇关于列表的嵌套字典到 pandas DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

列表的嵌套字典到 pandas DataFrame [英] Nested dict of lists to pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

列表的嵌套字典到 pandas DataFrame [英] Nested dict of lists to pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭