什么是 Solr 中良好的自动预热查询,它们是如何工作的? [英] What makes a good autowarming query in Solr and how do they work?

查看:24
本文介绍了什么是 Solr 中良好的自动预热查询,它们是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是对this question 关于 solr 安装中不常见的、隔离的读取超时.

This question is a follow up to this question about infrequent, isolated read timeouts in a solr installation.

发现新搜索者缺少/错误的自动预热查询可能存在的问题.

As a possible problem missing / bad autowarming queries for new searchers were found.

现在我对自动预热查询应该看起来像"有多好感到困惑.

Now I am confused about how good autowarming queries should "look like".

我阅读了但找不到任何关于此的好的文档.

I read up but couldnt find any good documentation on this.

他们应该在索引中打很多文件吗?还是应该在索引中存在的所有不同字段中都有匹配项?

Should they hit a lot of documents in the index? Or should they have matches in all distinct fields that exist in the index?

*:* 难道不是最好的自动预热查询吗?

Wouldnt just *:* be the best autowarming query or why not?

示例 solr 配置中包含这些示例查询:

The example solr config has theese sample queries in it:

<lst><str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str></lst>
<lst><str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str></lst>

我将它们更改为:

<lst><str name="q">george</str> <str name="start">0</str> <str name="rows">10</str></lst>

为什么?因为索引包含具有标题和演员字段的电影实体.这些是搜索最多的.乔治出现在片名和演员中.

Why? Because the index holds film entities with fields for titles and actors. Those are the most searched ones. And george appears in titles and actors.

我真的不知道这是否有意义.所以我的问题是:

I don't really know whether this makes sense. So my question is:

  • 对我的索引来说什么是好的自动预热查询?为什么?
  • 什么才是好的自动预热查询?

这是来自索引的示例文档.该索引有大约 70,000 个文档,它们都如下所示(当然只有不同的值):示例文档:

This is an example document from the index. The index has about 70,000 documents and they all look like this (only different values of course): example document:

 <doc> 
  <arr name="actor"><str>Tommy Lee Jones</str><str>Will Smith</str><str>Rip Torn</str> 
    <str>Lara Flynn Boyle</str><str>Johnny Knoxville</str><str>Rosario Dawson</str><str>Tony Shalhoub</str> 
    <str>Patrick Warburton</str><str>Jack Kehler</str><str>David Cross</str><str>Colombe Jacobsen-Derstine</str> 
    <str>Peter Spellos</str><str>Michael Rivkin</str><str>Michael Bailey Smith</str><str>Lenny Venito</str> 
    <str>Howard Spiegel</str><str>Alpheus Merchant</str><str>Jay Johnston</str><str>Joel McKinnon Miller</str> 
    <str>Derek Cecil</str></arr> 
  <arr name="affiliate"><str>amazon</str></arr> 
  <arr name="aka_title"><str>Men in Black II</str><str>MIB 2</str><str>MIIB</str> 
    <str>Men in Black 2</str><str>Men in black II (Hombres de negro II)</str><str>Hombres de negro II</str><str>Hommes en noir II</str></arr> 
  <bool name="blockbuster">false</bool> 
  <arr name="country"><str>US</str></arr> 
  <str name="description">Agent J (Will Smith) muss die Erde wieder vor einigem Abschaum bewahren, denn in Gestalt des verführerischen Dessous-Models Serleena (Lara Flynn Boyle) will ein Alien den Planeten unterjochen. Dabei benötigt J die Hilfe seines alten Partners Agent K (Tommy Lee Jones). Der wurde aber bei seiner "Entlassung" geblitzdingst, und so muß J seine Erinnerung erst mal etwas auffrischen bevor es auf die Jagd gehen kann.</str> 
  <arr name="director"><str>Barry Sonnenfeld</str></arr> 
  <int name="film_id">120912</int> 
  <arr name="genre"><str>Action</str><str>Komödie</str><str>Science Fiction</str></arr> 
  <str name="id">120912</str> 
  <str name="image_url">/media/search/filmcovers/105x/kf/false/F6Q1XW.jpg</str> 
  <int name="imdb_id">120912</int> 
  <date name="last_modified">2011-03-01T18:51:35.903Z</date> 
  <str name="locale_title">Men in Black II</str> 
  <int name="malus">3238</int> 
  <int name="parent_id">0</int> 
  <arr name="product_dvd"><str>amazon</str></arr> 
  <arr name="product_type"><str>dvd</str></arr> 
  <int name="rating">49</int> 
  <str name="sort_title">meninblack</str> 
  <int name="type">1</int> 
  <str name="url">/film/Men-in-Black-II-Barry-Sonnenfeld-Tommy-Lee-Jones-F6Q1XW/</str> 
  <int name="year">2002</int> 
 </doc> 

大多数查询都是对actor字段的精确匹配查询,并带有一些过滤器.

Most queries are exact match queries on actor fields with some filters in place.

示例:

信息:[] webapp=/solr path=/select/params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3}点击数=1 状态=0 QTime=4

INFO: [] webapp=/solr path=/select/ params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3} hits=1 status=0 QTime=4

推荐答案

变暖有两种类型.查询缓存预热和文档缓存预热(还有过滤器,但它们类似于查询).查询缓存预热可以通过一个设置来完成,该设置将在重新加载索引之前重新运行 X 次最近的查询.文档缓存预热不同.

There are 2 types of warming. Query cache warming and document cache warming (There's also filters, but those are similar to queries). Query cache warming can be done through a setting which will just re-run X number of recent queries before the index was reloaded. Document cache warming is different.

文档缓存预热的目标是将大量最常访问的文档放入文档缓存中,这样就不必从磁盘读取它们.所以,你的查询应该集中在这一点上.您需要尝试找出您最常搜索的文档并加载它们.最好使用最少的查询.这与字段的实际内容无关.澄清.当预热文档缓存时,您的主要兴趣是最常出现在搜索结果中的文档,无论它们是如何查询的.

The goal of document cache warming is to get a large quantity of your most frequently accessed documents into the document caches so they don't have to be read from disk. So, your queries should focus on this. You need to try and figure out what your most frequently searched documents are and load those. Preferably with a minimal number of queries. This has nothing to do with the actual content of the fields. To clarify. When warming document caches your primary interest is the documents that turn up in search RESULTS most often, regardless of how they are queried.

就个人而言,我会搜索以下内容:

Personally, I'd run searches for things like:

  • 按国家/地区加载,如果您搜索的大部分内容是美国电影.
  • 按年份加载,如果您的大部分搜索都是最近的电影.
  • 按流派加载,如果您有一个搜索量很大的流派的简短列表.

最后一种可能性是将它们全部加载.您的文件看起来很小.现在,就服务器内存而言,其中的 70,000 个不算什么.如果您的文档缓存足够大,并且您有足够的可用内存,那就去吧.作为旁注,您的一些最大好处将来自您的文档缓存.查询缓存仅对重复查询有益,而重复查询可能低得令人失望.您几乎总能从大型文档缓存中受益.

A last possibility is to load them all. Your documents look small. 70,000 of them is nothing in terms of server memory nowadays. If your document cache is large enough, and you have enough memory available, go for it. As a side note, some of your biggest benefit will be from your document cache. A query cache is only beneficial for repeated queries, which can be disappointingly low. You almost always benefit from a large document cache.

这篇关于什么是 Solr 中良好的自动预热查询,它们是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆