Python-使用正则表达式解析JSON格式的文本文件 [英] Python - Parsing JSON formatted text file with regex
问题描述
我有一个文本文件,格式类似于JSON文件,但是所有内容都在一行中(可以是MongoDB文件).有人可以指出我如何使用Python regex方法提取值的方向吗?
I have a text file formatted like a JSON file however everything is on a single line (could be a MongoDB File). Could someone please point me in the direction of how I could extract values using a Python regex method please?
文本显示如下:
{"d":{"__type":"WikiFileNodeContent:http:\/\/samplesite.com.au\/ns\/business\/wiki","author":null,"description":null,"fileAssetId":"034b9317-60d9-45c2-b6d6-0f24b59e1991","filename":"Reports.pdf"},"createdBy":1531,"createdByUsername":"John Cash","icon":"\/Assets10.37.5.0\/pix\/16x16\/page_white_acrobat.png","id":3041,"inheritedPermissions":false,"name":"map","permissions":[23,87,35,49,65],"type":3,"viewLevel":2},{"__type":"WikiNode:http:\/\/samplesite.com.au\/ns\/business\/wiki","children":[],"content":
我想获取"fileAssetId"和文件名".我尝试用Python的JSON模块加载类似内容,但出现错误
I am wanting to get the "fileAssetId" and filename". Ive tried to load the like with Pythons JSON module but I get an error
对于FileAssetid,我尝试了此正则表达式:
For the FileAssetid I tried this regex:
regex = re.compile(r"([0-9a-f]{8})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{4})\S*-\S*([0-9a-f]{12})")
但是我得到以下034b9317,60d9、45c2,b6d6、0f24b59e1991
But i get the following 034b9317, 60d9, 45c2, b6d6, 0f24b59e1991
我不确定如何获取显示的数据.
Im not to sure how to get the data as its displayed.
推荐答案
如何使用正向先行和向后看:
How about using positive lookahead and lookbehind:
(?<=\"fileAssetId\":\")[a-fA-F0-9-]+?(?=\")
捕获fileAssetId
和
(?<=\"filename\":\").+?(?=\")
与文件名匹配.
有关正则表达式的详细说明,请参见 Regex101 -示例. (注意:在示例中,我将两者与OR-Operator |组合在一起以同时显示两个匹配项)
For a detailed explanation of the regex have a look at the Regex101-Example. (Note: I combined both in the example with an OR-Operator | to show both matches at once)
要获取所有匹配项的列表,请使用re.findall
或re.finditer
而不是re.match
.
To get a list of all matches use re.findall
or re.finditer
instead of re.match
.
re.findall(pattern, string)
返回匹配字符串的列表.
re.findall(pattern, string)
returns a list of matching strings.
re.finditer(pattern, string)
返回带有对象的迭代器.
re.finditer(pattern, string)
returns an iterator with the objects.
这篇关于Python-使用正则表达式解析JSON格式的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!