从ElasticSearch结果创建DataFrame [英] Creating DataFrame from ElasticSearch Results

查看：574 发布时间：2017/8/7 0:19:44 python pandas elasticsearch

本文介绍了从ElasticSearch结果创建DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在大熊猫中建立一个DataFrame，使用对ElasticSearch的非常基本的查询结果。我得到了我需要的数据，但它是一个分割结果的方式来构建正确的数据框架的问题。我真的只关心每个结果的时间戳和路径。我尝试了几种不同的搜索模式。

I am trying to build a DataFrame in pandas, using the results of a very basic query to ElasticSearch. I am getting the Data I need, but its a matter of slicing the results in a way to build the proper data frame. I really only care about getting the timestamp, and path, of each result. I have tried a few different es.search patterns.

代码：

from datetime import datetime
from elasticsearch import Elasticsearch
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
es = Elasticsearch(host="192.168.121.252")
res = es.search(index="_all", doc_type='logs', body={"query": {"match_all": {}}}, size=2, fields=('path','@timestamp'))

这将提供4个数据块。 [你'，你'，'你'，你''']。我的结果在命中之内。

This gives 4 chunks of data. [u'hits', u'_shards', u'took', u'timed_out']. My results are inside the hits.

res['hits']['hits']
Out[47]: 
[{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
   u'path': u'app2.log'}},
 {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
   u'path': u'app1.log'}}]

我唯一关心的事情关于，正在获取每个命中的时间戳和路径。

The only things I care about, are getting the timestamp, and path for each hit.

res['hits']['hits'][0]['fields']
Out[48]: 
{u'@timestamp': u'2014-08-07T12:36:00.086Z',
 u'path': u'app1.log'}

我不能为我的生活找出谁获得这个结果，进入大熊猫的数据框架。所以对于我已经返回的2个结果，我会期待一个数据框如。

I can not for the life of me figure out who to get that result, into a dataframe in pandas. So for the 2 results I have returned, I would expect a dataframe like.

   timestamp                   path
0  2014-08-07T12:36:00.086Z    app1.log
1  2014-08-07T12:36:00.200Z    app2.log

推荐答案

有一个名叫 pd.DataFrame.from_dict 的好玩具，你可以在这样的情况下使用：

There is a nice toy called pd.DataFrame.from_dict that you can use in situation like this:

In [34]:

Data = [{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
       u'path': u'app2.log'}},
     {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
       u'path': u'app1.log'}}]
In [35]:

df = pd.concat(map(pd.DataFrame.from_dict, Data), axis=1)['fields'].T
In [36]:

print df.reset_index(drop=True)
                 @timestamp      path
0  2014-08-07T12:36:00.086Z  app2.log
1  2014-08-07T12:36:00.200Z  app1.log

显示四个步骤：

1，阅读列表中的每个项目（这是一个字典）转换为 DataFrame


1, Read each item in the list (which is a dictionary) into a DataFrame
 2，我们可以将列表中的所有项目放入由于我们将为每个项目执行步骤1，因此我们可以使用数据帧由 concat  code> map 来做。
2, We can put all the items in the list into a big DataFrame by concat them row-wise, since we will do step#1 for each item, we can use map to do it.
 3，然后我们访问标有'字段的列 
3, Then we access the columns labeled with 'fields'
 4，我们可能希望旋转 DataFrame  90度（转置）和如果我们希望索引是默认的 int 序列，则 reset_index  
4, We probably want to rotate the DataFrame 90 degrees (transpose) and reset_index if we want the index to be the default int sequence.
  

                        这篇关于从ElasticSearch结果创建DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从ElasticSearch结果创建DataFrame [英] Creating DataFrame from ElasticSearch Results

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从ElasticSearch结果创建DataFrame [英] Creating DataFrame from ElasticSearch Results

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭