无法正确解析 YAML [英] Can't parse YAML correctly

查看:47
本文介绍了无法正确解析 YAML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 python 中解析了以下 YAML 数据:

<预><代码>>>>导入 yaml>>>yaml.load("""... ---... 类别:{1:是,2:否}... 增加:[00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]……... """)

并将其作为输出:

{'increasing': [0, 1, 2, 3, 4, 5, 6, 7, '08', '09', 10], 'categories': {1: True, 2:错误的}}

  • 为什么是"和否"会被转换为 True 和 False?
  • 为什么08"和09"被解析为字符串,而其他数字被解析为前导零被截断的数字?

解决方案

您关于 0007 前导零被截断的推论是不正确的.由于 0 前导,因此这些都是八进制字符并解释为八进制字符.

由于八进制字符不能包含 890809 只能是字符串,并且你的 YAML 解析器这样加载它们.

这实际上是 YAML 1.1YAML 1.2 八进制数 应该以 0o

开头

那个 YesNo 分别被加载为 TrueFalse.也是一个 YAML-1.1-ishm.1.2 规范不再提及这些替代方案.如果引用这些字符串,它们将不会被转换

通过添加以下规则,您可以相对轻松地构建一个不接受 True/False 的 Yes/No/On/Off 变体的解析器:

MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:bool',re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),列表(u'tTfF'))

或者通过使用普通的Resolver并删除适当的起始符号条目:

将 ruamel.yaml 导入为 yaml从 ruamel.yaml.resolver 导入解析器yaml_str = """\类别:{1:是,2:否}"""对于列表中的 ch(u'yYnNoO'):del Resolver.yaml_implicit_resolvers[ch]数据 = yaml.load(yaml_str, Loader=yaml.Loader)打印(数据)

给你:

{'categories': {1: 'Yes', 2: 'No'}}

使所有以 0 开头的仅数字字符串被识别为普通整数并不是那么简单,因为如果您更改 int 的隐式解析器并传递以 0 开头的字符串,您会遇到解析问题,因为 08 是基于八进制 ¹ 转换的:

导入重新将 ruamel.yaml 导入为 yaml从 ruamel.yaml.reader 导入阅读器从 ruamel.yaml.resolver 导入 BaseResolver, Resolver从 ruamel.yaml.scanner 导入 RoundTripScanner从 ruamel.yaml.parser_ 导入解析器从 ruamel.yaml.composer 导入 Composer从 ruamel.yaml.constructor 导入 RoundTripConstructor从 ruamel.yaml 导入 RoundTripLoader从 ruamel.yaml.compat 导入到_stryaml_str = """\类别:{1:是,2:否}增加:[00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]"""类 MyResolver(BaseResolver):经过MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:bool',re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),列表(u'tTfF'))MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:float',重新编译(你'''^(?:[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)|\\.[0-9_]+(?:[eE][-+][0-9]+)?|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*|[-+]?\\.(?:inf|Inf|INF)|\\.(?:nan|NaN|NAN))$''', re.X),列表(u'-+0123456789.'))MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:int',re.compile(u'''^(?:[-+]?0b[0-1_]+|[-+]?[0-9]+|[-+]?0o?[0-7_]+|[-+]?(?:0|[1-9][0-9_]*)|[-+]?0x[0-9a-fA-F_]+|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),列表(u'-+0123456789'))MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:merge',re.compile(u'^(?:<<)$'),[u'<'])MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:null',重新编译(你'''^(?:〜|空|空|空|)$''', re.X),[u'~', u'n', u'N', u''])MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:timestamp',re.compile(u'''^(?:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]|[0-9][0-9][0-9][0-9] -[0-9][0-9]?-[0-9][0-9]?(?:[Tt]|[ \\t]+)[0-9][0-9]?:[0-9][0-9] :[0-9][0-9] (?:\\.[0-9]*)?(?:[ \\t]*(?:Z|[-+][0-9][0-9]?(?::[0-9][0-9])?))?)$''', re.X),列表(u'0123456789'))MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:value',re.compile(u'^(?:=)$'),[u'='])# 以下解析器仅用于文档目的.它不能工作# 因为普通标量不能以!"、&"或*"开头.MyResolver.add_implicit_resolver(u'tag:yaml.org,2002:yaml',re.compile(u'^(?:!|&|\\*)$'),列表(u'!&*'))类 MyRoundTripConstructor(RoundTripConstructor):def constructor_yaml_int(self, node):value = to_str(self.construct_scalar(node))value = value.replace('_', '')符号 = +1如果值[0] == '-':符号 = -1如果+-"中的值[0]:值 = 值[1:]如果值 == '0':返回 0elif value.startswith('0b'):返回符号*int(value[2:], 2)elif value.startswith('0x'):return sign*int(value[2:], 16)elif value.startswith('0o'):return sign*int(value[2:], 8)#elif 值[0] == '0':# return sign*int(value, 8)elif ':' 值:数字 = [int(part) for part in value.split(':')]数字.reverse()基数 = 1值 = 0对于数字中的数字:值 += 数字*基数基数 *= 60返回符号*值别的:返回符号*int(值)MyRoundTripConstructor.add_constructor(u'tag:yaml.org,2002:int',MyRoundTripConstructor.construct_yaml_int)类 MyRoundTripLoader(Reader, RoundTripScanner, Parser,作曲家、MyRoundTripConstructor、MyResolver):def __init__(self, stream):Reader.__init__(self, stream)RoundTripScanner.__init__(self)Parser.__init__(self)Composer.__init__(self)MyRoundTripConstructor.__init__(self)MyResolver.__init__(self)对于列表中的 ch(u'yYnNoO'):del Resolver.yaml_implicit_resolvers[ch]数据 = yaml.load(yaml_str, Loader=MyRoundTripLoader)打印(数据['增加'])

然后打印:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

(它也将 Yes/No 作为字符串进行处理,无需先在内部查找表中插入识别模式)

<小时>

¹ 为此我使用了 ruamel.yaml,其中我是作者.ruamel.yaml 所基于的 PyYAML 应该能够支持类似的推导.

I parse the following YAML data in python:

>>> import yaml
>>> yaml.load("""
... ---
... categories: {1: Yes, 2: No}
... increasing: [00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]
... ...
... """)

And get this as output:

{'increasing': [0, 1, 2, 3, 4, 5, 6, 7, '08', '09', 10], 'categories': {1: True, 2: False}}

  • Why are "Yes" and "No" converted to True and False?
  • Why are "08" and "09" parsed as strings whereas the other digits are parsed as numbers with leading zeros truncated?

解决方案

Your deduction that for 00 to 07 the leading zeros are truncated is incorrect. These are all octal characters because of the leading 0 and interpreted as such.

As octal characters cannot contain 8 or 9 the 08 and 09 cannot be anything but strings, and your YAML parser loads them as such.

This is actually a leftover (backwards compatibility) with YAML 1.1 in YAML 1.2 octal numbers should start with 0o

That Yes and No are loaded as True and False resp. is also a YAML-1.1-ishm. The 1.2 specification no longer refers to these alternatives. If you quote those strings, they will not be converted

You can relatively easily build a resolver that doesn't accept the Yes/No/On/Off variants for True/False by adding the following rule:

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:bool',
    re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),
    list(u'tTfF'))

or by using the normal Resolver and deleting the appropriate start symbol entries:

import ruamel.yaml as yaml
from ruamel.yaml.resolver import Resolver

yaml_str = """\
categories: {1: Yes, 2: No}
"""

for ch in list(u'yYnNoO'):
    del Resolver.yaml_implicit_resolvers[ch]


data = yaml.load(yaml_str, Loader=yaml.Loader)
print(data)

gives you:

{'categories': {1: 'Yes', 2: 'No'}}

Making all number-only strings that start with 0 to be recognised as normal integers is not so simple, because if you change the implicit resolver for int and pass the strings on that start with 0, you get a parsing problem, because 08 is converted based on octal ¹:

import re
import ruamel.yaml as yaml
from ruamel.yaml.reader import Reader
from ruamel.yaml.resolver import BaseResolver, Resolver
from ruamel.yaml.scanner import RoundTripScanner
from ruamel.yaml.parser_ import Parser
from ruamel.yaml.composer import Composer
from ruamel.yaml.constructor import RoundTripConstructor
from ruamel.yaml import RoundTripLoader
from ruamel.yaml.compat import to_str


yaml_str = """\
categories: {1: Yes, 2: No}
increasing: [00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]
"""


class MyResolver(BaseResolver):
    pass

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:bool',
    re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),
    list(u'tTfF'))

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:float',
    re.compile(u'''^(?:
     [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
    |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
    |\\.[0-9_]+(?:[eE][-+][0-9]+)?
    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
    |[-+]?\\.(?:inf|Inf|INF)
    |\\.(?:nan|NaN|NAN))$''', re.X),
    list(u'-+0123456789.'))

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:int',
    re.compile(u'''^(?:[-+]?0b[0-1_]+
    |[-+]?[0-9]+
    |[-+]?0o?[0-7_]+
    |[-+]?(?:0|[1-9][0-9_]*)
    |[-+]?0x[0-9a-fA-F_]+
    |[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),
    list(u'-+0123456789'))

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:merge',
    re.compile(u'^(?:<<)$'),
    [u'<'])

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:null',
    re.compile(u'''^(?: ~
    |null|Null|NULL
    | )$''', re.X),
    [u'~', u'n', u'N', u''])

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:timestamp',
    re.compile(u'''^(?:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]
    |[0-9][0-9][0-9][0-9] -[0-9][0-9]? -[0-9][0-9]?
    (?:[Tt]|[ \\t]+)[0-9][0-9]?
    :[0-9][0-9] :[0-9][0-9] (?:\\.[0-9]*)?
    (?:[ \\t]*(?:Z|[-+][0-9][0-9]?(?::[0-9][0-9])?))?)$''', re.X),
    list(u'0123456789'))

MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:value',
    re.compile(u'^(?:=)$'),
    [u'='])

# The following resolver is only for documentation purposes. It cannot work
# because plain scalars cannot start with '!', '&', or '*'.
MyResolver.add_implicit_resolver(
    u'tag:yaml.org,2002:yaml',
    re.compile(u'^(?:!|&|\\*)$'),
    list(u'!&*'))


class MyRoundTripConstructor(RoundTripConstructor):
    def construct_yaml_int(self, node):
        value = to_str(self.construct_scalar(node))
        value = value.replace('_', '')
        sign = +1
        if value[0] == '-':
            sign = -1
        if value[0] in '+-':
            value = value[1:]
        if value == '0':
            return 0
        elif value.startswith('0b'):
            return sign*int(value[2:], 2)
        elif value.startswith('0x'):
            return sign*int(value[2:], 16)
        elif value.startswith('0o'):
            return sign*int(value[2:], 8)
        #elif value[0] == '0':
        #    return sign*int(value, 8)
        elif ':' in value:
            digits = [int(part) for part in value.split(':')]
            digits.reverse()
            base = 1
            value = 0
            for digit in digits:
                value += digit*base
                base *= 60
            return sign*value
        else:
            return sign*int(value)

MyRoundTripConstructor.add_constructor(
    u'tag:yaml.org,2002:int',
    MyRoundTripConstructor.construct_yaml_int)


class MyRoundTripLoader(Reader, RoundTripScanner, Parser,
                      Composer, MyRoundTripConstructor, MyResolver):
    def __init__(self, stream):
        Reader.__init__(self, stream)
        RoundTripScanner.__init__(self)
        Parser.__init__(self)
        Composer.__init__(self)
        MyRoundTripConstructor.__init__(self)
        MyResolver.__init__(self)

for ch in list(u'yYnNoO'):
    del Resolver.yaml_implicit_resolvers[ch]

data = yaml.load(yaml_str, Loader=MyRoundTripLoader)
print(data['increasing'])

and that prints:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

(it also does Yes/No as strings, without first inserting the recognition patterns in the internal lookup table)


¹ I used ruamel.yaml for this, of which I am the author. PyYAML, on which ruamel.yaml is based, should be able to support a similar derivation.

这篇关于无法正确解析 YAML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆