为什么R包lubridate无法解析具有多种格式的向量? [英] Why R package lubridate can't parse vector with multiple formats?

查看:93
本文介绍了为什么R包lubridate无法解析具有多种格式的向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用程序包lubridate解析矢量格式不同的日期,并将其转换为字符串,如下所示:

I'm using package lubridate to parse a vector of heterogeneously-formatted dates and convert them to string, like this:

parse_date_time(c('12/17/1996 04:00:00 PM','4/18/1950 0130'), c('%m/%d/%Y %I:%M:%S %p','%m/%d/%Y %H%M'))

这是结果:

[1] NA NA
Warning message:
All formats failed to parse. No formats found.

如果我删除了第一个格式字符串中的%p,它会错误地解析第一个日期字符串,但仍然无法解析第二个日期字符串,就像这样:

If I remove the %p in the 1st format string, it incorrectly parses the 1st date string, and still doesn't parse the 2nd, like so:

[1] "1996-12-17 04:00:00 UTC" NA                       
Warning message:
 1 failed to parse. 

字符串中的4PM时间被解析为4AM.

The 4PM time in the string is parsed to 4AM in the result.

有人经历过这种奇怪的行为吗?

Has anyone experienced this strange behavior?

推荐答案

%p部分的问题与语言环境有关.参见此问题.

The problem with %p part is locale related. See this issue.

无法解析与lubridate猜测器的工作方式有关.

The inability to parse has to do with the way lubridate guesser works.

润滑润滑脂有两种方法可以推断格式:弹性和精确.使用flex匹配时,所有数字元素都可以具有灵活的长度(例如,一天中的404都可以使用),但是在元素之间必须有非数字分隔符.对于精确匹配器,不必使用非数字分隔符,但元素必须具有确切的数字位数(例如04).

Tthere are two ways lubridate infers formats, flex and exact. With flex matching all numeric elements can have flexible length (for example both 4 and 04 for day will work), but then, there must be non-numeric separators between the elements. For the exact matcher there need not be non-numeric separators but elements must have exact number of digits (like 04).

不幸的是,您不能将两个匹配器组合在一个表达式中.要解决此问题并保持lubridate解析器的当前灵活性非常困难.

Unfortunately you cannot combine both matchers within one expression. It would be extremely hard to fix this and preserve the current flexibility of the lubridate parser.

在您的示例中

> parse_date_time('4/18/1950 0130', 'mdY HM')
[1] NA
Warning message:
All formats failed to parse. No formats found. 

您要在日期部分4/18/1950上执行弹性匹配,而在时间部分0130上执行精确匹配.

you want to perform flex matching on the date part 4/18/1950 and exact matching on time part 0130.

请注意,如果您的日期时间采用完全flex或完全exact格式,则解析将按预期进行:

Please note that if your date-time is in fully flex, or fully exact format the parsing will work as expected:

> parse_date_time('04/18/1950 0130', 'mdY HM')
[1] "1950-04-18 01:30:00 UTC"
> parse_date_time('4/18/1950 1:30', 'mdY HM')
[1] "1950-04-18 01:30:00 UTC"

lubridate 1.4.1通过在parse_date_timeexact=FALSE中添加新参数来修复"此问题.设置为TRUE时,orders参数将解释为包含精确的strptime格式,并且不执行任何猜测或训练.这样,您可以根据需要添加任意数量的确切格式,而且由于完全不进行猜测,因此您也可以提高速度.

The lubridate 1.4.1 "fixes" this by adding a new argument to parse_date_time, exact=FALSE. When set toTRUE the orders argument is interpreted as containing exact strptime formats and no guessing or training is performed. This way you can add as many exact formats as you want and you will also gain in speed because no guessing is performed at all.

> parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'),
+                 c('%m/%d/%Y %I:%M:%S','%m/%d/%Y %H%M'),
+                 exact = T)
[1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC"

相关地,有一个明确的要求 .

Relatedly, there was an explicit requested asking for such an option.

这篇关于为什么R包lubridate无法解析具有多种格式的向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆