Python - 删除扩展的 ascii [英] Python - Remove extended ascii

查看:61
本文介绍了Python - 删除扩展的 ascii的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我对整个 Python 世界都不熟悉,所以请耐心等待.

Okay, so I am new to the whole python world so bear with me.

背景:我们正在尝试将日志卸载到 mongo 中,以便能够更快地查询和搜索它们.设备已经以合适的格式打印它们,除了在每个 }{ 开始和结束数据对象之间,如下所示:

Background: We are trying to offload logs into mongo to be able to query and search for them quicker. The device already prints them in a decent format EXCEPT in between each }{ to begin and end the data object something like this:

¾ïúÀï{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xxx"}}’ÂCº¾ïúÀï{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xx}}

使用以下内容,我已经能够将其转换为字节,然后返回为输出为的字符串:

Using the following I've been able to convert it to bytes then back to a string which outputs as:

f = open('logfile', 'r')
file_data = f.read()
f.close()

data = file_data.encode('utf-8')

print(str(data))

>>>b'\xc2\xbe\xc3\xaf\xc3\xba\xc3\x80\x01\xc3\xaf{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xxx"}}

在我看来,这比上面看到的丑陋角色更容易处理,但我不知道.

In my mind this is easier to deal with than the ugly characters seen above but I don't know.

这只是一个示例..此日志中返回了成千上万行.在我看来,理想情况下,最好的方法是删除 { 之前字符串开头的所有字符以及其间的所有字符 }}{

This is just a sample..there are thousands upon thousands of lines returned in this log. In my mind, ideally the best way to go about this would be to remove all characters at the beginning of the string before { and all characters in between }}{

推荐答案

将字符串编码为字节,然后再解码为 ASCII:

Encode the string to bytes and then decode back to ASCII:

data.encode().decode('ascii',errors='ignore')
# {"id":"xxx","timestamp":xxx,...}}

您还可以使用正则表达式删除最外面的大括号之外的所有字符:

You can also use regular expressions to remove all characters outside of the outermost curly braces:

re.sub(r'^[^{]*(?={)|(?<=})[^}{]*(?={)|(?<=})[^}]*$', '', data)

后一种机制顺便也删除了您不想要的 ASCII 'C' 字符.

The latter mechanism incidentally also removes the ASCII 'C' character that you do not want.

这篇关于Python - 删除扩展的 ascii的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆