如何解析包含多个文档的 YAML 文件? [英] How to parse a YAML file with multiple documents?

查看:39
本文介绍了如何解析包含多个文档的 YAML 文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的解析代码:

import yaml

def yaml_as_python(val):
    """Convert YAML to dict"""
    try:
        return yaml.load_all(val)
    except yaml.YAMLError as exc:
        return exc

with open('circuits-small.yaml','r') as input_file:
    results = yaml_as_python(input_file)
    print results
    for value in results:
         print value

这是文件的示例:

ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: SwitchBank_35496721
    attrs:
      Feeder: Line_928
      Switch.normalOpen: 'true'
      IdentifiedObject.description: SwitchBank
      IdentifiedObject.mRID: SwitchBank_35496721
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: SwitchBank_35496721
      IdentifiedObject.aliasName: SwitchBank_35496721
    loc: vector [43.05292, -76.126800000000003, 0.0]
    kind: SwitchBank
  - timestamp: 1970-01-01T00:00:00.000Z
    id: UndergroundDistributionLineSegment_34862802
    attrs:
      Feeder: Line_928
      status: de-energized
      IdentifiedObject.description: UndergroundDistributionLineSegment
      IdentifiedObject.mRID: UndergroundDistributionLineSegment_34862802
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: UndergroundDistributionLineSegment_34862802
    path:
    - vector [43.052942000000002, -76.126716000000002, 0.0]
    - vector [43.052585000000001, -76.126515999999995, 0.0]
    kind: UndergroundDistributionLineSegment
  - timestamp: 1970-01-01T00:00:00.000Z
    id: UndergroundDistributionLineSegment_34806014
    attrs:
      Feeder: Line_928
      status: de-energized
      IdentifiedObject.description: UndergroundDistributionLineSegment
      IdentifiedObject.mRID: UndergroundDistributionLineSegment_34806014
      PowerSystemResource.circuit: '928'
      IdentifiedObject.name: UndergroundDistributionLineSegment_34806014
    path:
    - vector [43.05292, -76.126800000000003, 0.0]
    - vector [43.052928999999999, -76.126766000000003, 0.0]
    - vector [43.052942000000002, -76.126716000000002, 0.0]
    kind: UndergroundDistributionLineSegment
... 
ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: OverheadDistributionLineSegment_31168454

在回溯中,请注意它在 ...

In the traceback, note that it starts having a problem at the ...

Traceback (most recent call last):
  File "convert.py", line 29, in <module>
    for value in results:
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/__init__.py", line 82, in load_all
    while loader.check_data():
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/constructor.py", line 28, in check_data
    return self.check_node()
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/composer.py", line 18, in check_node
    if self.check_event(StreamStartEvent):
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start
    self.peek_token().start_mark)
yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'
  in "circuits-small.yaml", line 42, column 1

我想要的是将这些文档中的每一个解析为一个单独的对象,也许它们都在同一个列表中,或者几乎可以与 PyYAML 模块一起使用的任何其他对象.我相信 ... 实际上是有效的 YAML,所以我很惊讶它不会自动处理它.

What I would like is for it to parse each of these documents as a separate object, perhaps all of them in the same list, or pretty much anything else that would work with the PyYAML module. I believe the ... is actually valid YAML so I am surprised that it doesn't handle it automatically.

推荐答案

该错误消息非常具体,文档需要以 文档开始标记.您的第一个文档没有这样的标记,尽管它有一个文档结束标记.在你用 ... 明确结束第一个文档后,你不能再在 PyYAML 中使用没有文档边界标记的文档,你必须明确地用 --- 开始它:

The error message is quite specific that a document needs to start with a document start marker. Your first document doesn't have such a marker, although it has a document end marker. After you explicitly end the first document with ... you can no longer use a document without document boundary markers in PyYAML, you explicitly have to start it with ---:

文件的结尾应如下所示:

The end of your file should look like:

    kind: UndergroundDistributionLineSegment
...
---
ingests:
  - timestamp: 1970-01-01T00:00:00.000Z
    id: OverheadDistributionLineSegment_31168454

您可以从第一个文档中省略显式文档开始标记,但您需要为每个后续文档包含一个开始标记.文档结束标记是可选的.

You can leave out the explicit document start marker from the first document, but you need to include a start marker for every following document. Document end markers are optional.

如果您不能完全控制输入,使用 .load_all() 是不安全的.通常没有理由冒险,您应该使用 .safe_load_all() 并扩展 SafeLoader 来处理您的 YAML 可能包含的任何特定标签.

If you don't have complete control over the input, using .load_all() is not safe. There normally is no reason to take that risk and you should be using .safe_load_all() and extend the SafeLoader to handle any specific tags that your YAML might contain.

除此之外,您应该使用明确的版本指令开始您的 YAML 文档在文档开始指示符之前(您还应该将其添加到第一个文档):

Apart from that you should start your YAML documents with an explicit version directive before the document start indicator (which you should also add to the first document):

%YAML 1.1
---

这是为了您的 YAML 文件的未来编辑者的利益,因为您使用的是 PyYAML,它仅支持(大部分)YAML 1.1 而不是 YAML 1.2 规范(2009 年表格).替代方案当然是将您的 YAML 解析器升级到例如 ruamel.yaml,这将还警告您使用不安全的 load_all()(免责声明:我是该解析器的作者).ruamel.yaml 不允许您在显式文档结束标记(如@flyx 指出的那样)之后拥有一个裸文档,这是一个 错误.

This is for the benefit of future editors of your YAML files, because you are using PyYAML, which only supports (most of) YAML 1.1 and not the YAML 1.2 specification (form 2009). The alternative is of course to upgrade your YAML parser to e.g ruamel.yaml, which would also have warned you about your use of the unsafe load_all() (disclaimer: I am the author of that parser). ruamel.yaml doesn't allow you to have a bare document after an explicit end-of-document marker (which is allowed as @flyx pointed out), which is a bug.

这篇关于如何解析包含多个文档的 YAML 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆