如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中? [英] How can I read data from a list and index specific values into Elasticsearch, using python?
问题描述
我用过paramiko"从我的 PC 连接到开发板,并执行脚本.然后我将这个脚本的结果保存在一个列表中(输出).我想提取列表的一些值并将它们插入到 Elasticsearch 中.我已经用列表的第一个结果手动完成了.但是我怎样才能自动化其余的值呢?我需要正则表达式"吗?请给我一些线索.
I have used "paramiko" to connect from my PC to a devboard, and execute a script. Then I am saving the results of this script in a list (output). I want to extract some values of the list and insert them into Elasticsearch. I have done it manually with the first result of the list. But how can I automate for the rest of the values? Do I need "regex"? Please give me some clues.
谢谢
这是连接到开发板、执行脚本和检索列表的代码的一部分=输出
THIS IS PART OF THE CODE THAT CONNECTS TO THE DEVBOARD, EXECUTES A SCRIPT AND RETRIEVES A LIST=output
def main():
ssh = initialize_ssh()
stdin, stdout, stderr = ssh.exec_command('cd coral/tflite/python/examples/classification/Auto_benchmark
python3 auto_benchmark.py')
output = stdout.readlines()
type(output)
#print(type(output))
print('
'.join(output))
ssh.close()
列表看起来像这样:
labels: imagenet_labels.txt
Model: efficientnet-edgetpu-S_quant_edgetpu.tflite
Image: img0000.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
Time: 6.2ms
Results: wall clock
Score: 0.25781
#####################################
labels: imagenet_labels.txt
Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite
Image: img0000.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
Time: 2.8ms
Results: umbrella
Score: 0.22266
#####################################
Temperature: 35C
这是将数据索引到 ELASTICSEARCH 所需的映射
THIS IS THE MAPPING THAT IS NEEDED TO INDEX DATA INTO ELASTICSEARCH
def initialize_mapping_classification(es):
"""
Initialise les mappings
"""
mapping_classification = {
'properties': {
'@timestamp': {'type': 'date'},
'type': 'coralito',
'Model': {'type': 'string'},
'Time': {'type': 'float'},
'Results': {'type': 'string'},
'Score': {'type': 'float'},
'Temperature': {'type': 'float'}
}
}
if not es.indices.exists(CORAL):
es.indices.create(CORAL)
es.indices.put_mapping(body=mapping_classification, doc_type=DOC_TYPE, index=CORAL)
这是我的尝试.我已经手动完成了列表的第一个结果.我想自动化
THIS IS MY ATTEMPT. I HAVE DONE IT MANUALLY WITH THE FIRST RESULT OF THE LIST. I WANT TO AUTOMATE IT
if CLASSIFY == 1:
doc = {
'@timestamp': str(datetime.datetime.utcnow().strftime("%Y-%m-%d"'T'"%H:%M:%S")),
'type': 'coralito',
'Model': "efficientnet-edgetpu-S_quant_edgetpu.tflite",
'Time': "6.2 ms",
'Results': "wall clock",
'Score': "0.25781",
'Temperature': "35 C"
}
response = send_data_elasticsearch(CORAL, DOC_TYPE, doc, es)
print(doc)
------------------------------编辑 2--------------------------------------
------------------------------EDIT 2---------------------------------------
这就是使用正则表达式提取感兴趣的值后我的数据的样子
So this is how my data looks like after using regex to extract the values of interest
这是我被索引的内容:
这是我的代码:
import elasticsearch
from elasticsearch import Elasticsearch, helpers
import datetime
import re
data = ['labels: imagenet_labels.txt
', '
', 'Model: efficientnet-edgetpu-S_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 23.1
', 'Time(ms): 5.7
', '
', '
', 'Inference: corkscrew, bottle screw
', 'Score: 0.03125
', '
', 'TPU_temp(°C): 57.05
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: efficientnet-edgetpu-M_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 29.3
', 'Time(ms): 10.8
', '
', '
', "Inference: dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk
", 'Score: 0.09375
', '
', 'TPU_temp(°C): 56.8
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: efficientnet-edgetpu-L_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 45.6
', 'Time(ms): 31.0
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.09766
', '
', 'TPU_temp(°C): 57.55
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: inception_v3_299_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 68.8
', 'Time(ms): 51.3
', '
', '
', 'Inference: ringlet, ringlet butterfly
', 'Score: 0.48047
', '
', 'TPU_temp(°C): 57.3
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: inception_v4_299_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 121.8
', 'Time(ms): 101.2
', '
', '
', 'Inference: admiral
', 'Score: 0.59375
', '
', 'TPU_temp(°C): 57.05
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: inception_v2_224_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 34.3
', 'Time(ms): 16.6
', '
', '
', 'Inference: lycaenid, lycaenid butterfly
', 'Score: 0.41406
', '
', 'TPU_temp(°C): 57.3
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: mobilenet_v2_1.0_224_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.4
', 'Time(ms): 3.3
', '
', '
', 'Inference: leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea
', 'Score: 0.36328
', '
', 'TPU_temp(°C): 57.3
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: mobilenet_v1_1.0_224_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 14.5
', 'Time(ms): 3.0
', '
', '
', 'Inference: bow tie, bow-tie, bowtie
', 'Score: 0.33984
', '
', 'TPU_temp(°C): 57.3
', '#####################################
', '
', 'labels: imagenet_labels.txt
', '
', 'Model: inception_v1_224_quant_edgetpu.tflite
', '
', 'Image: insect.jpg
', '
', '*The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory*
', 'Time(ms): 21.2
', 'Time(ms): 3.6
', '
', '
', 'Inference: pick, plectrum, plectron
', 'Score: 0.17578
', '
', 'TPU_temp(°C): 57.3
', '#####################################
', '
']
# declare a client instance of the Python Elasticsearch library
client = Elasticsearch("http://localhost:9200")
#using regex
regex = re.compile(r'(w+)((.+)):s(.*)|(w+:)s(.*)')
match_regex = list(filter(regex.match, data))
match = [line.rstrip('
') for line in match_regex]
#using "bulk"
def yield_docs():
"""
Initialise les mappings
"""
doc_source = {
"data": match
}
# use a yield generator so that the doc data isn't loaded into memory
yield {
"_index": "coralito",
"_type": "coralote",
"_source": doc_source
}
try:
# make the bulk call using 'actions' and get a response
resp = helpers.bulk(
client,
yield_docs()
)
print ("
helpers.bulk() RESPONSE:", resp)
print ("RESPONSE TYPE:", type(resp))
except Exception as err:
print("
helpers.bulk() ERROR:", err)
-----------------------------编辑 3---------------------
-----------------------------EDIT 3---------------------
推荐答案
- 删除换行符
- 用通用分隔符分割文本(
----INFERENCE TIME----
我认为是一个好的开始) - 提取密钥 &值使用例如
r'(w+:)s(.*)'
或命名的lookbehind,例如r'(?<=Note: ).*'
等 - 解析数值(时间、分数、温度……)——你以后会感谢我的;)
- 使用关键字数据类型扩展
Model
映射——否则点将是标记化了,你会想知道为什么你不能搜索完全匹配或聚合它 - 准备要同步的对象
批量
上传到 ElasticSearch
- Remove the line breaks
- Split the text by a common delimiter (
----INFERENCE TIME----
would be a good start I think) - Extract the keys & values using for example
r'(w+:)s(.*)'
or a named lookbehind such asr'(?<=Note: ).*'
etc - Parse the numeric values (time, score, temperature, ...) -- you'll thank me later ;)
- Extend the
Model
mapping w/ a keyword datatype -- otherwise the dot will be tokenized away and you'll wonder why you can't search for exact matches nor aggregate on it - Prepare the objects that you'll want to sync
Bulk
upload to ElasticSearch
这篇关于如何使用 python 从列表中读取数据并将特定值索引到 Elasticsearch 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!