Python-使用正则表达式解析JSON格式的文本文件 [英] Python - Parsing JSON formatted text file with regex

查看:310
本文介绍了Python-使用正则表达式解析JSON格式的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,格式类似于JSON文件,但是所有内容都在一行中(可以是MongoDB文件).有人可以指出我如何使用Python regex方法提取值的方向吗?

I have a text file formatted like a JSON file however everything is on a single line (could be a MongoDB File). Could someone please point me in the direction of how I could extract values using a Python regex method please?

文本显示如下:

{"d":{"__type":"WikiFileNodeContent:http:\/\/samplesite.com.‌​au\/ns\/business\/wi‌​ki","author":null,"d‌​escription":null,"fi‌​leAssetId":"034b9317‌​-60d9-45c2-b6d6-0f24‌​b59e1991","filename"‌​:"Reports.pdf"},"cre‌​atedBy":1531,"create‌​dByUsername":"John Cash","icon":"\/Assets10.37.5.0\/pix\/16x16\/page_white_acro‌​bat.png","id":3041,"‌​inheritedPermissions‌​":false,"name":"map"‌​,"permissions":[23,8‌​7,35,49,65],"type":3‌​,"viewLevel":2},{"__‌​type":"WikiNode:http‌​:\/\/samplesite.com.‌​au\/ns\/business\/wi‌​ki","children":[],"c‌​ontent": 

我想获取"fileAssetId"和文件名".我尝试用Python的JSON模块加载类似内容,但出现错误

I am wanting to get the "fileAssetId" and filename". Ive tried to load the like with Pythons JSON module but I get an error

对于FileAssetid,我尝试了此正则表达式:

For the FileAssetid I tried this regex:

regex = re.compile(r"([0-9a-f]{8})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{12})")

但是我得到以下034b9317‌,60d9、45c2,b6d6、0f24‌b59e1991

But i get the following 034b9317‌​, 60d9, 45c2, b6d6, 0f24‌​b59e1991

我不确定如何获取显示的数据.

Im not to sure how to get the data as its displayed.

推荐答案

如何使用正向先行和向后看:

How about using positive lookahead and lookbehind:

(?<=\"fileAssetId\":\")[a-fA-F0-9-]+?(?=\")

捕获fileAssetId

(?<=\"filename\":\").+?(?=\")

与文件名匹配.

有关正则表达式的详细说明,请参见 Regex101 -示例. (注意:在示例中,我将两者与OR-Operator |组合在一起以同时显示两个匹配项)

For a detailed explanation of the regex have a look at the Regex101-Example. (Note: I combined both in the example with an OR-Operator | to show both matches at once)

要获取所有匹配项的列表,请使用re.findallre.finditer而不是re.match.

To get a list of all matches use re.findall or re.finditer instead of re.match.

re.findall(pattern, string)返回匹配字符串的列表.

re.findall(pattern, string) returns a list of matching strings.

re.finditer(pattern, string)返回带有对象的迭代器.

re.finditer(pattern, string) returns an iterator with the objects.

这篇关于Python-使用正则表达式解析JSON格式的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆