如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中? [英] How can I read data from a list and index specific values into Elasticsearch, using python?

查看:23
本文介绍了如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用过paramiko"从我的 PC 连接到开发板,并执行脚本.然后我将这个脚本的结果保存在一个列表中(输出).我想提取列表的一些值并将它们插入到 Elasticsearch 中.我已经用列表的第一个结果手动完成了.但是我怎样才能自动化其余的值呢?我需要正则表达式"吗?请给我一些线索.

I have used "paramiko" to connect from my PC to a devboard, and execute a script. Then I am saving the results of this script in a list (output). I want to extract some values of the list and insert them into Elasticsearch. I have done it manually with the first result of the list. But how can I automate for the rest of the values? Do I need "regex"? Please give me some clues.

谢谢

这是连接到开发板、执行脚本和检索列表的代码的一部分=输出

THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output

def main():
    ssh = initialize_ssh()
    stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark
 python3 auto_benchmark.py')
    output = stdout.readlines()
    type(output)
    #print(type(output))
    print('
'.join(output))
    ssh.close()

列表看起来像这样:

labels: imagenet_labels.txt 

Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 6.2ms

Results: wall clock

Score: 0.25781

##################################### 

labels: imagenet_labels.txt 

Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 

Image: img0000.jpg 


----INFERENCE TIME----

Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

Time: 2.8ms

Results: umbrella

Score: 0.22266

##################################### 
Temperature: 35C

这是将数据索引到 ELASTICSEARCH 所需的映射

THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH

def initialize_mapping_classification(es):
    """
    Initialise les mappings
    """
    mapping_classification = {
        'properties': {
            '@timestamp': {'type': 'date'},
            'type': 'coralito',
            'Model': {'type': 'string'},
            'Time': {'type': 'float'},
            'Results': {'type': 'string'},
            'Score': {'type': 'float'},
            'Temperature': {'type': 'float'}
        }
    }

    if not es.indices.exists(CORAL):
        es.indices.create(CORAL)
        es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)

这是我的尝试.我已经手动完成了列表的第一个结果.我想自动化

THIS IS MY ATTEMPT. I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST. I WANT TO AUTOMATE IT

if CLASSIFY == 1:
                
        doc = {
            '@timestamp':  str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
            'type': 'coralito',
            'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
            'Time': "6.2 ms",
            'Results': "wall clock",
            'Score': "0.25781",
            'Temperature': "35 C"
        }

        response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)

        print(doc)

------------------------------编辑 2--------------------------------------

------------------------------EDIT 2---------------------------------------

这就是使用正则表达式提取感兴趣的值后我的数据的样子

So this is how my data looks like after using regex to extract the values of interest

这是我被索引的内容:

这是我的代码:

import elasticsearch  
from elasticsearch import Elasticsearch, helpers
import datetime
import re

data = ['labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 23.1
', 'Time(ms): 5.7
', '
', '
', 'Inference: corkscrew, bottle screw
', 'Score: 0.03125 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 29.3
', 'Time(ms): 10.8
', '
', '
', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk
", 'Score: 0.09375 
', '
', 'TPU_temp(°C): 56.8
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 45.6
', 'Time(ms): 31.0
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.09766 
', '
', 'TPU_temp(°C): 57.55
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v3_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 68.8
', 'Time(ms): 51.3
', '
', '
', 'Inference: ringlet, ringlet butterfly
', 'Score: 0.48047 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v4_299_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 121.8
', 'Time(ms): 101.2
', '
', '
', 'Inference: admiral
', 'Score: 0.59375 
', '
', 'TPU_temp(°C): 57.05
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v2_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 34.3
', 'Time(ms): 16.6
', '
', '
', 'Inference: lycaenid, lycaenid butterfly
', 'Score: 0.41406 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.4
', 'Time(ms): 3.3
', '
', '
', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea
', 'Score: 0.36328 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.5
', 'Time(ms): 3.0
', '
', '
', 'Inference: bow tie, bow-tie, bowtie
', 'Score: 0.33984 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
', 'labels: imagenet_labels.txt 
', '
', 'Model: inception_v1_224_quant_edgetpu.tflite 
', '
', 'Image: insect.jpg 
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 21.2
', 'Time(ms): 3.6
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.17578 
', '
', 'TPU_temp(°C): 57.3
', '##################################### 
', '
']


# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")

#using regex 
regex = re.compile(r'(w+)((.+)):s(.*)|(w+:)s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('
') for line in match_regex]


#using "bulk"
def yield_docs():
    """
    Initialise les mappings
    """
    
    doc_source = {
        "data": match
        
        }

    # use a yield generator so that the doc data isn't loaded into memory
    yield {
        "_index": "coralito",
        "_type": "coralote",
        "_source": doc_source
        }

try:
    # make the bulk call using 'actions' and get a response
    resp = helpers.bulk(
        client,
        yield_docs()
    )
    print ("
helpers.bulk() RESPONSE:", resp)
    print ("RESPONSE TYPE:", type(resp))
except Exception as err:
    print("
helpers.bulk() ERROR:", err)

-----------------------------编辑 3---------------------

-----------------------------EDIT 3---------------------

推荐答案

  1. 删除换行符
  2. 用通用分隔符分割文本(----INFERENCE TIME---- 我认为是一个好的开始)
  3. 提取密钥 &值使用例如 r'(w+:)s(.*)' 或命名的lookbehind,例如r'(?<=Note: ).*'
  4. 解析数值(时间、分数、温度……)——你以后会感谢我的;)
  5. 使用关键字数据类型扩展Model映射——否则点将是标记化了,你会想知道为什么你不能搜索完全匹配或聚合它
  6. 准备要同步的对象
  7. 批量上传到 ElasticSearch
  1. Remove the line breaks
  2. Split the text by a common delimiter (----INFERENCE TIME---- would be a good start I think)
  3. Extract the keys & values using for example r'(w+:)s(.*)' or a named lookbehind such as r'(?<=Note: ).*' etc
  4. Parse the numeric values (time, score, temperature, ...) -- you'll thank me later ;)
  5. Extend the Model mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it
  6. Prepare the objects that you'll want to sync
  7. Bulk upload to ElasticSearch

这篇关于如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆