SimpleDateFormat宽大处理导致意外行为 [英] SimpleDateFormat leniency leads to unexpected behavior

查看:57
本文介绍了SimpleDateFormat宽大处理导致意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现

默认情况下,解析是宽松的:如果输入的格式不是此对象的format方法使用的格式,但仍可以将其解析为日期,则解析成功.

By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds.

如果我将宽大处理设置为 false ,则文档说严格解析后,输入必须与该对象的格式匹配.我在没有宽大模式的情况下使用了 SimpleDateFormat 进行配对,并且由于错误,我在日期中出现了错字(字母 o 而不是数字 0 ).(以下是简短的工作代码:)

If I set the leniency to false, the documentation said that with strict parsing, inputs must match this object's format. I have used paring with SimpleDateFormat without the lenient mode and by mistake, I had a typo in the date (letter o instead of number 0). (Here is the brief working code:)

// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o"));        //WTF?

令我惊讶的是,这一切都过去了,没有已抛出 ParseException .我走得更远:

In my surprise, this has passed and no ParseException has been thrown. I'd go further:

// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));

我们继续:

// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));

最后,抛出异常:不可解析的日期:"o3.12.1990" .我想知道宽大处理的区别在哪里,为什么我的第一个代码片段的最后一行没有引发异常?该文档说:

Finally, the exception is thrown: Unparseable date: "o3.12.1990". I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? The documentation says:

在严格分析的情况下,输入必须与该对象的格式匹配.

With strict parsing, inputs must match this object's format.

我的输入显然不严格地匹配该格式-我希望此解析非常严格.为什么会(不会)发生这种情况?

My input clearly doesn't strictly match the format - I expect this parsing to be really strict. Why does this (not) happen?

推荐答案

为什么会这样(不发生)?

Why does this (not) happen?

在文档中没有很好地解释.

It’s not very well explained in the documentation.

通过宽大的解析,解析器可能会使用启发式方法来解释与该对象的格式不完全匹配的输入.严格解析时,输入必须匹配该对象的格式.

With lenient parsing, the parser may use heuristics to interpret inputs that do not precisely match this object's format. With strict parsing, inputs must match this object's format.

不过,文档确实有所帮助,因为它提到了 DateFormat 使用的 Calendar 对象很宽大.该 Calendar 对象不是用于解析本身,而是用于将解析后的值解释为日期和时间(我引用 DateFormat 文档,因为 SimpleDateFormat DateFormat )的子类.

The documentation does help a bit, though, by mentioning that it is the Calendar object that the DateFormat uses that is lenient. That Calendar object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat documentation since SimpleDateFormat is a subclass of DateFormat).

  • SimpleDateFormat ,无论是否宽容,都将接受3位数字的年份,例如 199 ,即使您已指定 yyyy 中的格式模式字符串.该文档说明了有关年份:

  • SimpleDateFormat, no matter if lenient or not, will accept 3-digit year, for example 199, even though you have specified yyyy in the format pattern string. The documentation says about year:

对于解析,如果图案字母的数量大于2,则年份无论数字位数如何,都将按字面意义进行解释.所以用模式"MM/dd/yyyy","01/11/12"解析为公元12年1月11日.

For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.

  • DateFormat ,无论是否宽容,都接受并忽略解析后的文本之后的文本,例如第一个示例中的小写字母 o .它反对在文本之前或之内的意外文本,就像在上一个示例中将字母 o 放在前面一样. DateFormat.parse 的文档说:

  • DateFormat, no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o in your first example. It objects to unexpected text before or inside the text, as when in your last example you put the letter o in front. The documentation of DateFormat.parse says:

    该方法可能不会使用给定字符串的整个文本.

    The method may not use the entire text of the given string.

  • 正如我间接说的那样,宽大处理在将解析的值解释为日期和时间时会有所不同.因此,宽大的 SimpleDateFormat 会将2019年2月29日解释为01.03.2019,因为2019年2月只有28天.严格的 SimpleDateFormat 将拒绝这样做并将引发异常.默认的宽容行为会导致非常令人惊讶和彻头彻尾的莫名其妙的结果.举一个简单的例子,以错误的顺序给出日期,月份和年份: 1990.03.12 将导致公元17年8月11日(2001年前).

  • As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. So a lenient SimpleDateFormat will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat will refuse to do that and will throw an exception. The default lenient behaviour can lead to very surprising and downright inexplicable results. As a simple example, giving the day, month and year in the wrong order: 1990.03.12 will result in August 11 year 17 AD (2001 years ago).

    VGR已经在注释中提到了现代Java日期和时间API java.time 中的 LocalDate .以我的经验,与旧的日期和时间类相比, java.time 更好用,所以让我们尝试一下.首先尝试正确的日期字符串:

    VGR already in a comment mentioned LocalDate from java.time, the modern Java date and time API. In my experience java.time is so much nicer to work with than the old date and time classes, so let’s give it a shot. Try a correct date string first:

        DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
        System.out.println(LocalDate.parse("03.12.1990", dateFormatter));
    

    我们得到:

    java.time.format.DateTimeParseException:文本'03 .12.1990'无法被解析:无法从TemporalAccessor获取LocalDate:{Year = 1990,DayOfMonth = 3,MinuteOfHour = 12},ISO类型java.time.format.Parsed

    java.time.format.DateTimeParseException: Text '03.12.1990' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type java.time.format.Parsed

    这是因为我使用了 dd.mm.yyyy 的格式模式字符串,其中小写的 mm 表示分钟.当我们仔细阅读错误消息时,它确实表明 DateTimeFormatter 将12解释为小时,这不是我们想要的.尽管 SimpleDateFormat 默契地接受了这一点(即使严格),但 java.time 有助于指出我们的错误.该消息仅间接表示其缺少月份值.我们需要在月份中使用大写的 MM .同时,我用错字尝试输入您的日期字符串:

    This is because I used your format pattern string of dd.mm.yyyy, where lowercase mm means minute. When we read the error message closely enough, it does state that the DateTimeFormatter interpreted 12 as minute of hour, which was not what we intended. While SimpleDateFormat tacitly accepted this (even when strict), java.time is more helpful in pointing out our mistake. What the message only indirectly says is that it is missing a month value. We need to use uppercase MM for month. At the same time I am trying your date string with the typo:

        DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
        System.out.println(LocalDate.parse("03.12.199o", dateFormatter));
    

    我们得到:

    java.time.format.DateTimeParseException:文本'03 .12.199o'无法在索引6处解析

    java.time.format.DateTimeParseException: Text '03.12.199o' could not be parsed at index 6

    索引6表示 199 .它之所以反对,是因为我们指定了4位数字,并且只提供了3位数字.文档说:

    Index 6 is where is says 199. It objects because we had specified 4 digits and are only supplying 3. The docs say:

    字母数决定了最小字段宽度...

    The count of letters determines the minimum field width …

    它也将反对日期之后的未解析文本.简而言之,在我看来,它为您提供了您期望的一切.

    It would also object to unparsed text after the date. In short it seems to me that it gives you everything that you had expected.

    • DateFormat.setLenient documentation
    • Oracle tutorial: Date Time explaining how to use java.time.

    这篇关于SimpleDateFormat宽大处理导致意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆