嵌套字典中的pandas数据框(elasticsearch结果) [英] pandas dataframe from a nested dictionary (elasticsearch result)

查看:84
本文介绍了嵌套字典中的pandas数据框(elasticsearch结果)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难将来自Elasticsearch聚合的结果转换为熊猫. 我正在尝试编写一个抽象函数,该函数将使用嵌套字典(任意级别的数量)并将其展平为pandas数据框

I am having hard time translating results from elasticsearch aggregations to pandas. I am trying to write an abstract function which would take nested dictionary (arbitrary number of levels) and flatten them into a pandas dataframe

典型结果如下所示

-我也添加了父键

x1 = {u'xColor': {u'buckets': [{u'doc_count': 4,
u'key': u'red',
u'xMake': {u'buckets': [{u'doc_count': 3,
   u'key': u'honda',
   u'xCity': {u'buckets': [{u'doc_count': 2, u'key': u'ROME'},
     {u'doc_count': 1, u'key': u'Paris'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}},
  {u'doc_count': 1,
   u'key': u'bmw',
   u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Paris'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}}],
 u'doc_count_error_upper_bound': 0,
 u'sum_other_doc_count': 0}},
 {u'doc_count': 2,
u'key': u'blue',
u'xMake': {u'buckets': [{u'doc_count': 1,
   u'key': u'ford',
   u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Paris'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}},
  {u'doc_count': 1,
   u'key': u'toyota',
   u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}}],
 u'doc_count_error_upper_bound': 0,
   u'sum_other_doc_count': 0}},
    {u'doc_count': 2,
u'key': u'green',
u'xMake': {u'buckets': [{u'doc_count': 1,
   u'key': u'ford',
   u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}},
    {u'doc_count': 1,
      u'key': u'toyota',
     u'xCity': {u'buckets': [{u'doc_count': 1, u'key': u'Berlin'}],
    u'doc_count_error_upper_bound': 0,
    u'sum_other_doc_count': 0}}],
 u'doc_count_error_upper_bound': 0,
 u'sum_other_doc_count': 0}}],
 u'doc_count_error_upper_bound': 0,
 u'sum_other_doc_count': 0}}

我想要的是一个具有最低级别doc_count的数据框

what I would like to have is a dataframe with the doc_count of the lowest level

第一条记录

 red-honda-rome-2 

 red-honda-paris-1

 red-bmw-paris-1

我在此处遇到了大熊猫中的json_normalize,但是我不明白如何输入参数,我也发现了扁平化的不同建议嵌套的字典,但无法真正理解它们的工作原理. 任何帮助我入门的帮助将不胜感激 Elasticsearch结果到表中

I came across json_normalize in pandas here but do not understand how to put the arguments and I and have seen different suggestions for flattening a nested dictionary but can't really understand how they work. Any help to get me started would be appreciated Elasticsearch result to table

更新

我尝试使用 dpath ,它是一个很棒的库,但我看不到如何将其抽象化(以仅将存储桶名称作为参数的函数形式)作为dpath不能处理值是列表(而不是其他字典)的结构

I tried to use dpath which is a great library, but I do not see how to abstract this (in form of a function which takes just the buckets names as arguments) as dpath cannot handle the structure in which values are lists (and not other dictionaries)

import dpath 
import pandas as pd 

xListData = []
for q1 in dpath.util.get(x1, 'xColor/buckets'):
      xColor = q1['key']
for q2 in dpath.util.get(q1, 'xMake/buckets'):
    #print '--', q2['key']
    xMake = q2['key']
    for q3 in dpath.util.get(q2, 'xCity/buckets'):
        #xDict = []
        xCity = q3['key']
        doc_count = q3['doc_count']
        xDict = {'color': xColor, 'make': xMake, 'city': xCity, 'doc_count': doc_count}
        #print '------', q3['key'], q3['doc_count']
        xListData.append(xDict)

pd.DataFrame(xListData)

这给出了:

city    color   doc_count   make
0   ROME    red     2   honda
1   Paris   red     1   honda
2   Paris   red     1   bmw
3   Paris   blue    1   ford
4   Berlin  blue    1   toyota
5   Berlin  green   1   ford
6   Berlin  green   1   toyota

推荐答案

尝试使用递归函数:

import pandas as pd
def elasticToDataframe(elasticResult,aggStructure,record={},fulllist=[]):
    for agg in aggStructure:
        buckets = elasticResult[agg['key']]['buckets']
        for bucket in buckets:
            record = record.copy()
            record[agg['key']] = bucket['key']
            if 'aggs' in agg: 
                elasticToDataframe(bucket,agg['aggs'],record,fulllist)
            else: 
                for var in agg['variables']:
                    record[var['dfName']] = bucket[var['elasticName']]

                fulllist.append(record)

    df = pd.DataFrame(fulllist)
    return df

然后使用数据(x1)和正确配置的"aggStructure"字典调用该函数.数据的嵌套性质必须在此字典中得到体现.

Then call the function with your data (x1) and a properly configured 'aggStructure' dict. The nested nature of the data must be reflected in this dict.

aggStructure=[{'key':'xColor','aggs':[{'key':'xMake','aggs':[{'key':'xCity','variables':[{'elasticName':'doc_count','dfName':'count'}]}]}]}]
elasticToDataframe(x1,aggStructure)

欢呼

这篇关于嵌套字典中的pandas数据框(elasticsearch结果)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆