如何解析 xsd:dateTime 格式? [英] How to parse xsd:dateTime format?

查看:47
本文介绍了如何解析 xsd:dateTime 格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

xsd:dateTime 类型的值可以有多种形式,如放松 NG.

Values of type xsd:dateTime can have a variety of forms, as described in RELAX NG.

如何将所有表单解析为时间或日期时间对象?

How can I parse all the forms into either time or datetime objects?

推荐答案

它实际上是一种非常受限制的格式,尤其是与所有 ISO 8601 相比.使用正则表达式与使用 strptime 加上自己处理偏移量(strptime 不这样做).

It's actually a pretty restricted format, especially compared to all of ISO 8601. Using a regex is mostly the same as using strptime plus handling the offset yourself (which strptime doesn't do).

import datetime
import re

def parse_timestamp(s):
  """Returns (datetime, tz offset in minutes) or (None, None)."""
  m = re.match(""" ^
    (?P<year>-?[0-9]{4}) - (?P<month>[0-9]{2}) - (?P<day>[0-9]{2})
    T (?P<hour>[0-9]{2}) : (?P<minute>[0-9]{2}) : (?P<second>[0-9]{2})
    (?P<microsecond>\.[0-9]{1,6})?
    (?P<tz>
      Z | (?P<tz_hr>[-+][0-9]{2}) : (?P<tz_min>[0-9]{2})
    )?
    $ """, s, re.X)
  if m is not None:
    values = m.groupdict()
    if values["tz"] in ("Z", None):
      tz = 0
    else:
      tz = int(values["tz_hr"]) * 60 + int(values["tz_min"])
    if values["microsecond"] is None:
      values["microsecond"] = 0
    else:
      values["microsecond"] = values["microsecond"][1:]
      values["microsecond"] += "0" * (6 - len(values["microsecond"]))
    values = dict((k, int(v)) for k, v in values.iteritems()
                  if not k.startswith("tz"))
    try:
      return datetime.datetime(**values), tz
    except ValueError:
      pass
  return None, None

不处理将时区偏移量应用于日期时间,负年数是日期时间的问题.这两个问题都可以通过处理 xsd:dateTime 所需的全部范围的不同时间戳类型来解决.

Doesn't handle applying the time zone offset to the datetime, and negative years are a problem with datetime. Both of those problems would be fixed by different timestamp type that handled the full range required by xsd:dateTime.

valid = [
  "2001-10-26T21:32:52",
  "2001-10-26T21:32:52+02:00",
  "2001-10-26T19:32:52Z",
  "2001-10-26T19:32:52+00:00",
  #"-2001-10-26T21:32:52",
  "2001-10-26T21:32:52.12679",
]
for v in valid:
  print
  print v
  r = parse_timestamp(v)
  assert all(x is not None for x in r), v

  # quick and dirty, and slightly wrong
  # (doesn't distinguish +00:00 from Z among other issues)
  # but gets through the above cases

  tz = ":".join("%02d" % x for x in divmod(r[1], 60)) if r[1] else "Z"
  if r[1] > 0: tz = "+" + tz
  r = r[0].isoformat() + tz

  print r
  assert r.startswith(v[:len("CCYY-MM-DDThh:mm:ss")]), v

print "---"
invalid = [
  "2001-10-26",
  "2001-10-26T21:32",
  "2001-10-26T25:32:52+02:00",
  "01-10-26T21:32",
]
for v in invalid:
  print v
  r = parse_timestamp(v)
  assert all(x is None for x in r), v

这篇关于如何解析 xsd:dateTime 格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆