弹性搜寻和Y10k(超过4位数字的年份) [英] Elastic Search and Y10k (years with more than 4 digits)

查看:61
本文介绍了弹性搜寻和Y10k(超过4位数字的年份)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现与Elastic Search查询有关的这个问题,但是由于 ES日期格式文档链接到

问题是org.apache.commons.validator.DateValidator在内部使用较旧的SimpleDateFormat类来验证输入是否符合模式和"yyyy"的含义.如SimpleDateFormat所解释的那样:至少使用4位数字,但如果需要,可以使用更多数字.创建带有模式"yyyy-MM-dd"的SimpleDateFormat.因此,两者都将解析诸如"20202-07-14"之类的输入.并类似地格式化年份大于9999的Date对象.

新的DateTimeFormatter类要严格得多,并且用"yyyy"表示恰好四个数字.它将无法解析诸如"20202-07-14"之类的输入字符串.并且也无法格式化年份超过9999的Temporal对象.值得注意的是,DateTimeFormatter本身具有处理可变长度字段的能力.常数DateTimeFormatter.ISO_LOCAL_DATE例如不等同于"yyyy-MM-dd",但是符合ISO8601,允许年份使用多于四位数,但将至少使用四位数.此常量是使用DateTimeFormatterBuilder而不是使用模式字符串以编程方式创建的.

ES不能配置为使用DateTimeFormatter中定义的常量(例如ISO_LOCAL_DATE),而只能使用模式字符串.ES还知道预定义模式的列表,文档中有时还会引用ISO标准,但是它们似乎是错误的,并且忽略了有效的ISO日期字符串可以包含五位数字的年份.

我可以使用多个允许的日期模式列表来配置ES,例如"yyyy-MM-dd || yyyyy-MM-dd".这将允许一年中的四位数和五位数,但在六位数的年份中会失败.我可以通过添加另一个允许的模式来支持六位数字的年份:"yyyy-MM-dd || yyyyy-MM-dd || yyyyyy-MM-dd",但是它会失败七位数,依此类推.>

我是在监督什么,还是真的无法将ES(或使用模式字符串的DateTimeFormatter实例)配置为具有ISO标准所使用的至少四位数(但可能更多)的Year字段?/p>

编辑

ISO 8601

由于您的要求是要符合ISO 8601,所以我们首先来看一下ISO 8601的内容(引自底部的链接):

为了表示0000之前或9999之后的年份,标准还允许扩展年份表示,但只能通过发送者和接收者之间的协议.扩大的一年表示形式[±YYYYY]必须具有商定的额外年份数超过最低四位数的数字,并且必须以+作为前缀或-用符号代替更常见的AD/BC(或CE/BCE)表示法;…

因此, 20202-12-03 在ISO 8601中不是有效日期.如果您明确告知用户您接受(例如,不超过6位数字的年份),则 + 20202-12-03 -20202-12-03 有效,并且仅带有 + -符号.

接受多于4位数字

格式模式 uuuu-MM-dd 按照ISO 8601格式化和解析日期,年份也超过四位.例如:

  DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");LocalDate date = LocalDate.parse("+ 20202-12-03",dateFormatter);System.out.println(解析为:" +日期);System.out.println(格式化为:" + date.format(dateFormatter)); 

输出:

 已分析:+ 20202-12-03格式化后:+ 20202-12-03 

对于带前缀的减号(而不是加号),它的工作原理非常相似.

接受超过4位无符号的数字

  yyyy-MM-dd || yyyyy-MM-dd || yyyyyy-MM-dd || yyyyyyy-MM-dd || yyyyyyyy-MM-dd || yyyyyyyyyy-MM-dd 

正如我所说,这与ISO 8601不同.我也同意您的看法,这并不好.很显然它将失败10位或更多位数字,但是无论如何都会失败:java.time处理-999 999 999到+999 999 999区间中的年.因此,尝试 yyyyyyyyyy-MM-dd (10位数字的年份)会给您带来严重的麻烦,除非在特殊情况下用户输入前导零的年份.

对不起,这是最好的. DateTimeFormatter 格式模式不支持您所要求的所有内容.没有(单个)模式可以为您提供0000到9999范围内的四位数年份,在此之后的年份中可以提供更多位数.

DateTimeFormatter 的文档说明了有关格式和解析年份的信息:

年份:字母的数量确定了最小字段宽度,在该最小字段宽度以下使用填充.如果字母数为2,则a使用简化的两位数形式.对于打印,这将输出最右边的两位数.对于解析,这将使用基数进行解析值2000,则年份范围为2000到2099包括的.如果字母数少于四个(但不能少于两个),那么该符号仅输出负年份 SignStyle.NORMAL .否则,如果焊盘宽度为根据 SignStyle.EXCEEDS_PAD 超出了.

因此,无论您要查询哪种模式字母,您都将无法解析没有符号的数字较多的年份,而位数较少的年份将以这么多的数字加上前导零来格式化

原始答案

您可能可以摆脱 u-MM-dd 模式.演示:

 字符串formatPattern ="u-MM-dd";DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);LocalDate normalDate = LocalDate.parse("2020-07-14",dateFormatter);字符串formattedAgain = normalDate.format(dateFormatter);System.out.format("LocalDate:%s.String:%s.%n",normalDate,formattedAgain);LocalDate largeDate = LocalDate.parse("20202-07-14",dateFormatter);字符串largeFormattedAgain = largeDate.format(dateFormatter);System.out.format("LocalDate:%s.String:%s.%n",largeDate,largeFormattedAgain); 

输出:

  LocalDate:2020-07-14.字串:2020-07-14.本地日期:+ 20202-07-14.字串:20202-07-14. 

反算,但实际上,一个格式字母并不表示 1位数,而是尽可能多的位数.因此,上述情况的另一面是,将在1000年之前的年份中使用少于4位数字进行格式化.正如您所说,它不符合ISO 8601.

有关年份的图案字母 y u 之间的差异,请参阅底部的链接.

您可能还会考虑一个 M 和/或一个 d 接受 2020-007-014 ,但这又会导致格式化小于10的数字只能变成1位数字,例如 2020-7-14 ,这可能不是您想要的,并且再次与ISO不一致.

链接

I discovered this issue in connection with Elastic Search queries, but since the ES date format documentation links to the API documentation for the java.time.format.DateTimeFormatter class, the problem is not really ES specific.

Short summary: We are having problems with dates beyond year 9999, more exactly, years with more than 4 digits.

The documents stored in ES have a date field, which in the index descriptor is defined with format "date", which corresponds to "yyyy-MM-dd" using the pattern language from DateTimeFormatter. We are getting user input, validate the input using org.apache.commons.validator.DateValidator.isValid also with the pattern "yyyy-MM-dd" and if valid, we create an ES query with the user input. This fails with an execption if the user inputs something like 20202-12-03. The search term is probably not intentional, but the expected behaviour would be not to find anything and not that the software coughs up an exception.

The problem is that org.apache.commons.validator.DateValidator is internally using the older SimpleDateFormat class to verify if the input conforms to the pattern and the meaning of "yyyy" as interpreted by SimpleDateFormat is something like: Use at least 4 digits, but allow more digits if required. Creating a SimpleDateFormat with pattern "yyyy-MM-dd" will thus both parse an input like "20202-07-14" and similarly format a Date object with a year beyond 9999.

The new DateTimeFormatter class is much more strict and means with "yyyy" exactly four digits. It will fail to parse an input string like "20202-07-14" and also fail to format a Temporal object with a year beyond 9999. It is worth to notice that DateTimeFormatter is itself capable of handling variable-length fields. The constant DateTimeFormatter.ISO_LOCAL_DATE is for example not equivalent to "yyyy-MM-dd", but does, conforming with ISO8601, allow years with more than four digits, but will use at least four digits. This constant is created programmatically with a DateTimeFormatterBuilder and not using a pattern string.

ES can't be configured to use the constants defined in DateTimeFormatter like ISO_LOCAL_DATE, but only with a pattern string. ES also knows a list of predefined patterns, occasionally the ISO standard is also referred to in the documentation, but they seem to be mistaken and ignore that a valid ISO date string can contain five digit years.

I can configure ES with a list of multiple allowed date patterns, e.g "yyyy-MM-dd||yyyyy-MM-dd". That will allow both four and five digits in the year, but fail for a six digit year. I can support six digit years by adding yet another allowed pattern: "yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd", but then it fails for seven digit years and so on.

Am I overseeing something, or is it really not possible to configure ES (or a DateTimeFormatter instance using a pattern string) to have a year field with at least four digits (but potentially more) as used by the ISO standard?

解决方案

Edit

ISO 8601

Since your requirement is to conform with ISO 8601, let’s first see what ISO 8601 says (quoted from the link at the bottom):

To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver. An expanded year representation [±YYYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign instead of the more common AD/BC (or CE/BCE) notation; …

So 20202-12-03 is not a valid date in ISO 8601. If you explicitly inform your users that you accept, say, up to 6 digit years, then +20202-12-03 and -20202-12-03 are valid, and only with the + or - sign.

Accepting more than 4 digits

The format pattern uuuu-MM-dd formats and parses dates in accordance with ISO 8601, also years with more than four digits. For example:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");
    LocalDate date = LocalDate.parse("+20202-12-03", dateFormatter);
    System.out.println("Parsed: " + date);
    System.out.println("Formatted back: " + date.format(dateFormatter));

Output:

Parsed: +20202-12-03
Formatted back: +20202-12-03

It works quite similarly for a prefixed minus instead of the plus sign.

Accepting more than 4 digits without sign

    yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd||yyyyyyy-MM-dd||yyyyyyyy-MM-dd||yyyyyyyyy-MM-dd

As I said, this disagrees with ISO 8601. I also agree with you that it isn’t nice. And obviously it will fail for 10 or more digits, but that would fail for a different reason anyway: java.time handles years in the interval -999 999 999 through +999 999 999. So trying yyyyyyyyyy-MM-dd (10 digit year) would get you into serious trouble except in the corner case where the user enters a year with a leading zero.

I am sorry, this is as good as it gets. DateTimeFormatter format patterns do not support all of what you are asking for. There is no (single) pattern that will give you four digit years in the range 0000 through 9999 and more digits for years after that.

The documentation of DateTimeFormatter says about formatting and parsing years:

Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years as per SignStyle.NORMAL. Otherwise, the sign is output if the pad width is exceeded, as per SignStyle.EXCEEDS_PAD.

So no matter which count of pattern letters you go for, you will be unable to parse years with more digits without sign, and years with fewer digits will be formatted with this many digits with leading zeroes.

Original answer

You can probably get away with the pattern u-MM-dd. Demonstration:

    String formatPattern = "u-MM-dd";
    
    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);
    
    LocalDate normalDate = LocalDate.parse("2020-07-14", dateFormatter);
    String formattedAgain = normalDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", normalDate, formattedAgain);
    
    LocalDate largeDate = LocalDate.parse("20202-07-14", dateFormatter);
    String largeFormattedAgain = largeDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", largeDate, largeFormattedAgain);

Output:

LocalDate: 2020-07-14. String: 2020-07-14.
LocalDate: +20202-07-14. String: 20202-07-14.

Counter-intuituvely but very practically one format letter does not mean 1 digit but rather as many digits as it takes. So the flip side of the above is that years before year 1000 will be formatted with fewer than 4 digits. Which, as you say, disagrees with ISO 8601.

For the difference between pattern letter y and u for year see the link at the bottom.

You might also consider one M and/or one d to accept 2020-007-014, but again, this will cause formatting into just 1 digit for numbers less than 10, like 2020-7-14, which probably isn’t what you want and again disagrees with ISO.

Links

这篇关于弹性搜寻和Y10k(超过4位数字的年份)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆