Java7“Solr / Lucene”的严重程度如何?错误? [英] How serious is the Java7 "Solr/Lucene" bug?

查看:105
本文介绍了Java7“Solr / Lucene”的严重程度如何?错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然Java7有一些关于循环优化的讨厌错误: Google搜索

Apparently Java7 has some nasty bug regarding loop optimization: Google search.

从报告和错误描述中我发现很难判断这个bug有多重要(除非你使用Solr或Lucene)。

From the reports and bug descriptions I find it hard to judge how significant this bug is (unless you use Solr or Lucene).

我想知道的事情:


  • 有多大可能我的(任何)程序是否受到影响?

  • 这个错误是否足够确定,正常测试会捕获它?

注意:我不能让我的程序用户使用 -XX:-UseLoopPredicate 来避免这个问题。

Note: I can't make users of my program use -XX:-UseLoopPredicate to avoid the problem.

推荐答案

任何热点错误的问题是,你需要达到编译门槛(例如10000)才能得到你:所以如果你的单元测试是微不足道的,你可能不会抓住它。

The problem with any hotspot bugs, is that you need to reach the compilation threshold (e.g. 10000) before it can get you: so if your unit tests are "trivial", you probably won't catch it.

例如,我们在lucene中发现了不正确的结果问题,因为这个特定的测试创建了20,000个文档索引。

For example, we caught the incorrect results issue in lucene, because this particular test creates 20,000 document indexes.

在我们的测试中,我们随机化不同的接口(例如,不同的Directory实现)和索引参数等,并且测试仅在1%的时间内失败,当然它随后可以使用相同的随机种子重现。我们还在测试创建的每个索引上运行checkindex,它会进行一些健全性测试以确保索引没有损坏。

In our tests we randomize different interfaces (e.g. different Directory implementations) and indexing parameters and such, and the test only fails 1% of the time, of course its then reproducable with the same random seed. We also run checkindex on every index that tests create, which do some sanity tests to ensure the index is not corrupt.

对于我们发现的测试,如果你有一个特定的配置:例如RAMDirectory + PulsingCodec +为该字段存储的有效负载,然后在它达到编译阈值之后,对发布的枚举循环返回不正确的计算,在这种情况下,一个术语的返回文档数量!=为该术语存储的docFreq。

For the test we found, if you have a particular configuration: e.g. RAMDirectory + PulsingCodec + payloads stored for the field, then after it hits the compilation threshold, the enumeration loop over the postings returns incorrect calculations, in this case the number of returned documents for a term != the docFreq stored for the term.

我们有很多压力测试,重要的是要注意这个测试中的正常断言实际上是通过的,它的最后检查索引部分失败了。

We have a good number of stress tests, and its important to note the normal assertions in this test actually pass, its the checkindex part at the end that fails.

这个问题的一大问题是,lucene的增量索引从根本上通过将多个段合并为一个来实现:因此,如果这些枚举计算出无效数据,那么这个无效数据就是存储到新合并的索引中:又称腐败。

The big problem with this, is that lucene's incremental indexing fundamentally works by merging multiple segments into one: because of this, if these enums calculate invalid data, this invalid data is then stored into the newly merged index: aka corruption.

我会说这个错误比我们遇到的先前循环优化器热点错误(例如签名)更加偷偷摸摸-flip stuff, https://issues.apache.org/jira/browse/LUCENE-2975 ) 。在那种情况下,我们得到了古怪的负面文档增量,这使得它很容易捕获。我们也只需要手动展开一个方法来躲避它。另一方面,我们最初的唯一测试是一个巨大的10GB指数 http://www.pangaea .de / ,所以将它缩小到这个bug很痛苦。

I'd say this bug is much sneakier than previous loop optimizer hotspot bugs we have hit (e.g. sign-flip stuff, https://issues.apache.org/jira/browse/LUCENE-2975). In that case we got wacky negative document deltas, which make it easy to catch. We also only had to manually unroll a single method to dodge it. On the other hand, the only "test" we had initially for that was a huge 10GB index of http://www.pangaea.de/, so it was painful to narrow it down to this bug.

在这种情况下,我花了很多时间(例如,每晚上周)试图手动展开/内联各种各样的东西,试图创建一些解决方法,以便我们可以躲避错误,而不是有可能创建损坏的索引。我可以躲避一些案件,但是还有更多案例我不能......而且我确信如果我们能够在我们的测试中触发这些东西,那么还有更多案例...

In this case, I spent a good amount of time (e.g. every night last week) trying to manually unroll/inline various things, trying to create some workaround so we could dodge the bug and not have the possibility of corrupt indexes being created. I could dodge some cases, but there were many more cases I couldn't... and I'm sure if we can trigger this stuff in our tests there are more cases out there...

这篇关于Java7“Solr / Lucene”的严重程度如何?错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆