Pyparsing:将半 JSON 嵌套的纯文本数据解析为列表 [英] Pyparsing: Parsing semi-JSON nested plaintext data to a list

查看:16
本文介绍了Pyparsing:将半 JSON 嵌套的纯文本数据解析为列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆嵌套数据,其格式与 JSON 大致相似:

I have a bunch of nested data in a format that loosely resembles JSON:

company="My Company"
phone="555-5555"
people=
{
    person=
    {
        name="Bob"
        location="Seattle"
        settings=
        {
            size=1
            color="red"
        }
    }
    person=
    {
        name="Joe"
        location="Seattle"
        settings=
        {
            size=2
            color="blue"
        }
    }
}
places=
{
    ...
}

有许多具有不同深度级别的不同参数——这只是一个很小的子集.

There are many different parameters with varying levels of depth--this is just a very small subset.

同样值得注意的是,当创建一个新的子数组时,总是有一个等号后跟一个换行符后跟一个开括号(如上所示).

It also might be worth noting that when a new sub-array is created that there is always an equals sign followed by a line break followed by the open bracket (as seen above).

是否有任何简单的循环或递归技术可以将此数据转换为系统友好的数据格式,例如数组或 JSON?我想避免对属性名称进行硬编码.我正在寻找可以在 Python、Java 或 PHP 中使用的东西.伪代码也可以.

Is there any simple looping or recursion technique for converting this data to a system-friendly data format such as arrays or JSON? I want to avoid hard-coding the names of properties. I am looking for something that will work in Python, Java, or PHP. Pseudo-code is fine, too.

感谢您的帮助.

我发现了 Python 的 Pyparsing 库,它看起来很有帮助.我找不到任何关于如何使用 Pyparsing 解析未知深度的嵌套结构的示例.任何人都可以根据我上面描述的数据阐明 Pyparsing 吗?

I discovered the Pyparsing library for Python and it looks like it could be a big help. I can't find any examples for how to use Pyparsing to parse nested structures of unknown depth. Can anyone shed light on Pyparsing in terms of the data I described above?

编辑 2:好的,这是 Pyparsing 中的一个有效解决方案:

EDIT 2: Okay, here is a working solution in Pyparsing:

def parse_file(fileName):

#get the input text file
file = open(fileName, "r")
inputText = file.read()

#define the elements of our data pattern
name = Word(alphas, alphanums+"_")
EQ,LBRACE,RBRACE = map(Suppress, "={}")
value = Forward() #this tells pyparsing that values can be recursive
entry = Group(name + EQ + value) #this is the basic name-value pair


#define data types that might be in the values
real = Regex(r"[+-]?d+.d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)

#declare the overall structure of a nested data element
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE) #we will turn the output into a Dictionary

#declare the types that might be contained in our data value - string, real, int, or the struct we declared
value << (quotedString | struct | real | integer)

#parse our input text and return it as a Dictionary
result = Dict(OneOrMore(entry)).parseString(inputText)
return result.dump()

这可行,但是当我尝试使用 json.dump(result) 将结果写入文件时,文件的内容用双引号括起来.此外,许多数据对之间还有 字符.我尝试在上面的代码中使用 LineEnd().suppress() 抑制它们,但我一定没有正确使用它.

This works, but when I try to write the results to a file with json.dump(result), the contents of the file are wrapped in double quotes. Also, there are chraacters between many of the data pairs. I tried suppressing them in the code above with LineEnd().suppress() , but I must not be using it correctly.

好的,我想出了一个最终的解决方案,它实际上将这些数据转换为我最初想要的 JSON 友好的 Dict.它首先使用 Pyparsing 将数据转换为一系列嵌套列表,然后循环遍历列表并将其转换为 JSON.这使我能够克服 Pyparsing 的 toDict() 方法无法处理同一个对象具有两个同名属性的问题.为了确定列表是普通列表还是属性/值对,prependPropertyToken 方法在 Pyparsing 检测到属性名称时在属性名称前添加字符串 __property__.

Okay, I came up with a final solution that actually transforms this data into a JSON-friendly Dict as I originally wanted. It first using Pyparsing to convert the data into a series of nested lists and then loops through the list and transforms it into JSON. This allows me to overcome the issue where Pyparsing's toDict() method was not able to handle where the same object has two properties of the same name. To determine whether a list is a plain list or a property/value pair, the prependPropertyToken method adds the string __property__ in front of property names when Pyparsing detects them.

def parse_file(self,fileName):
            
            #get the input text file
            file = open(fileName, "r")
            inputText = file.read()


            #define data types that might be in the values
            real = Regex(r"[+-]?d+.d*").setParseAction(lambda x: float(x[0]))
            integer = Regex(r"[+-]?d+").setParseAction(lambda x: int(x[0]))
            yes = CaselessKeyword("yes").setParseAction(replaceWith(True))
            no = CaselessKeyword("no").setParseAction(replaceWith(False))
            quotedString.setParseAction(removeQuotes)
            unquotedString =  Word(alphanums+"_-?"")
            comment = Suppress("#") + Suppress(restOfLine)
            EQ,LBRACE,RBRACE = map(Suppress, "={}")
            
            data = (real | integer | yes | no | quotedString | unquotedString)
            
            #define structures
            value = Forward()
            object = Forward() 
            
            dataList = Group(OneOrMore(data))
            simpleArray = (LBRACE + dataList + RBRACE)
            
            propertyName = Word(alphanums+"_-.").setParseAction(self.prependPropertyToken)
            property = dictOf(propertyName + EQ, value)
            properties = Dict(property)
            
            object << (LBRACE + properties + RBRACE)
            value << (data | object | simpleArray)
            
            dataset = properties.ignore(comment)
            
            #parse it
            result = dataset.parseString(inputText)
            
            #turn it into a JSON-like object
            dict = self.convert_to_dict(result.asList())
            return json.dumps(dict)
            
    
    
    def convert_to_dict(self, inputList):
            dict = {}
            for item in inputList:
                    #determine the key and value to be inserted into the dict
                    dictval = None
                    key = None
                    
                    if isinstance(item, list):
                            try:
                                    key = item[0].replace("__property__","")
                                    if isinstance(item[1], list):
                                            try:
                                                    if item[1][0].startswith("__property__"):
                                                            dictval = self.convert_to_dict(item)
                                                    else:
                                                            dictval = item[1]
                                            except AttributeError:
                                                    dictval = item[1]
                                    else:
                                            dictval = item[1]
                            except IndexError:
                                    dictval = None
                    #determine whether to insert the value into the key or to merge the value with existing values at this key
                    if key:
                            if key in dict:
                                    if isinstance(dict[key], list):
                                            dict[key].append(dictval)
                                    else:
                                            old = dict[key]
                                            new = [old]
                                            new.append(dictval)
                                            dict[key] = new
                            else:
                                    dict[key] = dictval
            return dict

    
                    
    def prependPropertyToken(self,t):
            return "__property__" + t[0]

推荐答案

好的,我想出了一个最终的解决方案,它实际上将这些数据转换为我最初想要的 JSON 友好的 Dict.它首先使用 Pyparsing 将数据转换为一系列嵌套列表,然后循环遍历列表并将其转换为 JSON.这使我能够克服 Pyparsing 的 toDict() 方法无法处理同一个对象具有两个同名属性的问题.为了确定列表是普通列表还是属性/值对,prependPropertyToken 方法在 Pyparsing 检测到属性名称时在属性名称前添加字符串 __property__.

Okay, I came up with a final solution that actually transforms this data into a JSON-friendly Dict as I originally wanted. It first using Pyparsing to convert the data into a series of nested lists and then loops through the list and transforms it into JSON. This allows me to overcome the issue where Pyparsing's toDict() method was not able to handle where the same object has two properties of the same name. To determine whether a list is a plain list or a property/value pair, the prependPropertyToken method adds the string __property__ in front of property names when Pyparsing detects them.

def parse_file(self,fileName):

            #get the input text file
            file = open(fileName, "r")
            inputText = file.read()


            #define data types that might be in the values
            real = Regex(r"[+-]?d+.d*").setParseAction(lambda x: float(x[0]))
            integer = Regex(r"[+-]?d+").setParseAction(lambda x: int(x[0]))
            yes = CaselessKeyword("yes").setParseAction(replaceWith(True))
            no = CaselessKeyword("no").setParseAction(replaceWith(False))
            quotedString.setParseAction(removeQuotes)
            unquotedString =  Word(alphanums+"_-?"")
            comment = Suppress("#") + Suppress(restOfLine)
            EQ,LBRACE,RBRACE = map(Suppress, "={}")

            data = (real | integer | yes | no | quotedString | unquotedString)

            #define structures
            value = Forward()
            object = Forward() 

            dataList = Group(OneOrMore(data))
            simpleArray = (LBRACE + dataList + RBRACE)

            propertyName = Word(alphanums+"_-.").setParseAction(self.prependPropertyToken)
            property = dictOf(propertyName + EQ, value)
            properties = Dict(property)

            object << (LBRACE + properties + RBRACE)
            value << (data | object | simpleArray)

            dataset = properties.ignore(comment)

            #parse it
            result = dataset.parseString(inputText)

            #turn it into a JSON-like object
            dict = self.convert_to_dict(result.asList())
            return json.dumps(dict)



    def convert_to_dict(self, inputList):
            dict = {}
            for item in inputList:
                    #determine the key and value to be inserted into the dict
                    dictval = None
                    key = None

                    if isinstance(item, list):
                            try:
                                    key = item[0].replace("__property__","")
                                    if isinstance(item[1], list):
                                            try:
                                                    if item[1][0].startswith("__property__"):
                                                            dictval = self.convert_to_dict(item)
                                                    else:
                                                            dictval = item[1]
                                            except AttributeError:
                                                    dictval = item[1]
                                    else:
                                            dictval = item[1]
                            except IndexError:
                                    dictval = None
                    #determine whether to insert the value into the key or to merge the value with existing values at this key
                    if key:
                            if key in dict:
                                    if isinstance(dict[key], list):
                                            dict[key].append(dictval)
                                    else:
                                            old = dict[key]
                                            new = [old]
                                            new.append(dictval)
                                            dict[key] = new
                            else:
                                    dict[key] = dictval
            return dict



    def prependPropertyToken(self,t):
            return "__property__" + t[0]

这篇关于Pyparsing:将半 JSON 嵌套的纯文本数据解析为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆