什么是 Solr 中良好的自动预热查询以及它们如何工作? [英] What makes a good autowarming query in Solr and how do they work?

查看:12
本文介绍了什么是 Solr 中良好的自动预热查询以及它们如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是 this question 关于 solr 安装中不常见的、孤立的读取超时.

This question is a follow up to this question about infrequent, isolated read timeouts in a solr installation.

发现新搜索者可能会丢失/错误的自动预热查询.

As a possible problem missing / bad autowarming queries for new searchers were found.

现在我对自动暖化查询应该看起来像"多好感到困惑.

Now I am confused about how good autowarming queries should "look like".

我阅读了但找不到任何好的文档.

I read up but couldnt find any good documentation on this.

他们应该点击索引中的大量文档吗?还是应该在索引中存在的所有不同字段中匹配?

Should they hit a lot of documents in the index? Or should they have matches in all distinct fields that exist in the index?

*:* 会不会是最好的自动升温查询,或者为什么不呢?

Wouldnt just *:* be the best autowarming query or why not?

示例 solr 配置中包含以下示例查询:

The example solr config has theese sample queries in it:

<lst><str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str></lst>
<lst><str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str></lst>

我把它们改成:

<lst><str name="q">george</str> <str name="start">0</str> <str name="rows">10</str></lst>

为什么?因为索引包含具有标题和演员字段的电影实体.这些是搜索最多的.乔治出现在头衔和演员中.

Why? Because the index holds film entities with fields for titles and actors. Those are the most searched ones. And george appears in titles and actors.

我真的不知道这是否有意义.所以我的问题是:

I don't really know whether this makes sense. So my question is:

  • 对我的索引有什么好的自动预热查询?为什么?
  • 什么是好的自动预热查询?

这是索引中的示例文档.该索引有大约 70,000 个文档,它们都看起来像这样(当然只有不同的值):示例文档:

This is an example document from the index. The index has about 70,000 documents and they all look like this (only different values of course): example document:

 <doc> 
  <arr name="actor"><str>Tommy Lee Jones</str><str>Will Smith</str><str>Rip Torn</str> 
    <str>Lara Flynn Boyle</str><str>Johnny Knoxville</str><str>Rosario Dawson</str><str>Tony Shalhoub</str> 
    <str>Patrick Warburton</str><str>Jack Kehler</str><str>David Cross</str><str>Colombe Jacobsen-Derstine</str> 
    <str>Peter Spellos</str><str>Michael Rivkin</str><str>Michael Bailey Smith</str><str>Lenny Venito</str> 
    <str>Howard Spiegel</str><str>Alpheus Merchant</str><str>Jay Johnston</str><str>Joel McKinnon Miller</str> 
    <str>Derek Cecil</str></arr> 
  <arr name="affiliate"><str>amazon</str></arr> 
  <arr name="aka_title"><str>Men in Black II</str><str>MIB 2</str><str>MIIB</str> 
    <str>Men in Black 2</str><str>Men in black II (Hombres de negro II)</str><str>Hombres de negro II</str><str>Hommes en noir II</str></arr> 
  <bool name="blockbuster">false</bool> 
  <arr name="country"><str>US</str></arr> 
  <str name="description">Agent J (Will Smith) muss die Erde wieder vor einigem Abschaum bewahren, denn in Gestalt des verführerischen Dessous-Models Serleena (Lara Flynn Boyle) will ein Alien den Planeten unterjochen. Dabei benötigt J die Hilfe seines alten Partners Agent K (Tommy Lee Jones). Der wurde aber bei seiner "Entlassung" geblitzdingst, und so muß J seine Erinnerung erst mal etwas auffrischen bevor es auf die Jagd gehen kann.</str> 
  <arr name="director"><str>Barry Sonnenfeld</str></arr> 
  <int name="film_id">120912</int> 
  <arr name="genre"><str>Action</str><str>Komödie</str><str>Science Fiction</str></arr> 
  <str name="id">120912</str> 
  <str name="image_url">/media/search/filmcovers/105x/kf/false/F6Q1XW.jpg</str> 
  <int name="imdb_id">120912</int> 
  <date name="last_modified">2011-03-01T18:51:35.903Z</date> 
  <str name="locale_title">Men in Black II</str> 
  <int name="malus">3238</int> 
  <int name="parent_id">0</int> 
  <arr name="product_dvd"><str>amazon</str></arr> 
  <arr name="product_type"><str>dvd</str></arr> 
  <int name="rating">49</int> 
  <str name="sort_title">meninblack</str> 
  <int name="type">1</int> 
  <str name="url">/film/Men-in-Black-II-Barry-Sonnenfeld-Tommy-Lee-Jones-F6Q1XW/</str> 
  <int name="year">2002</int> 
 </doc> 

大多数查询是对参与者字段的完全匹配查询,并带有一些过滤器.

Most queries are exact match queries on actor fields with some filters in place.

例子:

信息:[] webapp=/solr path=/select/params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3}hits=1 status=0 QTime=4

INFO: [] webapp=/solr path=/select/ params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3} hits=1 status=0 QTime=4

推荐答案

有两种升温方式.查询缓存预热和文档缓存预热(也有过滤器,但与查询类似).查询缓存预热可以通过一个设置来完成,该设置将在重新加载索引之前重新运行 X 个最近的查询.文档缓存变暖是不同的.

There are 2 types of warming. Query cache warming and document cache warming (There's also filters, but those are similar to queries). Query cache warming can be done through a setting which will just re-run X number of recent queries before the index was reloaded. Document cache warming is different.

文档缓存预热的目标是将大量最常访问的文档放入文档缓存中,这样就不必从磁盘读取它们.因此,您的查询应该集中在这一点上.您需要尝试找出最常搜索的文档是什么并加载它们.最好使用最少数量的查询.这与字段的实际内容无关.澄清.当预热文档缓存时,您的主要兴趣是最常出现在搜索结果中的文档,无论它们是如何查询的.

The goal of document cache warming is to get a large quantity of your most frequently accessed documents into the document caches so they don't have to be read from disk. So, your queries should focus on this. You need to try and figure out what your most frequently searched documents are and load those. Preferably with a minimal number of queries. This has nothing to do with the actual content of the fields. To clarify. When warming document caches your primary interest is the documents that turn up in search RESULTS most often, regardless of how they are queried.

就个人而言,我会搜索以下内容:

Personally, I'd run searches for things like:

  • 如果您的大部分搜索都是针对美国电影,则按国家/地区加载.
  • 如果您的大部分搜索都是针对较新的电影,则按年份加载.
  • 如果您有大量搜索过的流派的简短列表,则按流派加载.

最后一种可能是全部加载.您的文档看起来很小.如今,就服务器内存而言,其中 70,000 个根本算不上什么.如果您的文档缓存足够大,并且您有足够的可用内存,那就去吧.附带说明一下,您的一些最大好处将来自您的文档缓存.查询缓存仅对重复查询有益,这可能令人失望地低.您几乎总能从大型文档缓存中受益.

A last possibility is to load them all. Your documents look small. 70,000 of them is nothing in terms of server memory nowadays. If your document cache is large enough, and you have enough memory available, go for it. As a side note, some of your biggest benefit will be from your document cache. A query cache is only beneficial for repeated queries, which can be disappointingly low. You almost always benefit from a large document cache.

这篇关于什么是 Solr 中良好的自动预热查询以及它们如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆