什么使Solr中的自动预热查询良好,它们如何工作? [英] What makes a good autowarming query in Solr and how do they work?

查看:81
本文介绍了什么使Solr中的自动预热查询良好,它们如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是对

This question is a follow up to this question about infrequent, isolated read timeouts in a solr installation.

发现一个可能的问题,即缺少针对新搜索者的自动预热查询/查询错误.

As a possible problem missing / bad autowarming queries for new searchers were found.

现在,我对良好的自动热查询应该看起来如何感到困惑.

Now I am confused about how good autowarming queries should "look like".

我阅读了,但是找不到任何好的文档.

I read up but couldnt find any good documentation on this.

他们应该在索引中打很多文件吗?还是应该在索引中存在的所有不同字段中都具有匹配项?

Should they hit a lot of documents in the index? Or should they have matches in all distinct fields that exist in the index?

*:*会不会是最好的自动变暖查询,或者为什么不呢?

Wouldnt just *:* be the best autowarming query or why not?

示例solr配置中包含这些示例查询:

The example solr config has theese sample queries in it:

<lst><str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str></lst>
<lst><str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str></lst>

我将它们更改为:

<lst><str name="q">george</str> <str name="start">0</str> <str name="rows">10</str></lst>

为什么?因为该索引包含具有标题和演员字段的电影实体.这些是搜索最多的.乔治出现在片头和演员中.

Why? Because the index holds film entities with fields for titles and actors. Those are the most searched ones. And george appears in titles and actors.

我真的不知道这是否有意义.所以我的问题是:

I don't really know whether this makes sense. So my question is:

  • 什么是对我的索引有效的自动预热查询,为什么?
  • 什么使一个好的自动预热查询?

这是索引中的示例文档.该索引大约有70,000个文档,它们看起来都是这样(当然只有不同的值): 示例文档:

This is an example document from the index. The index has about 70,000 documents and they all look like this (only different values of course): example document:

 <doc> 
  <arr name="actor"><str>Tommy Lee Jones</str><str>Will Smith</str><str>Rip Torn</str> 
    <str>Lara Flynn Boyle</str><str>Johnny Knoxville</str><str>Rosario Dawson</str><str>Tony Shalhoub</str> 
    <str>Patrick Warburton</str><str>Jack Kehler</str><str>David Cross</str><str>Colombe Jacobsen-Derstine</str> 
    <str>Peter Spellos</str><str>Michael Rivkin</str><str>Michael Bailey Smith</str><str>Lenny Venito</str> 
    <str>Howard Spiegel</str><str>Alpheus Merchant</str><str>Jay Johnston</str><str>Joel McKinnon Miller</str> 
    <str>Derek Cecil</str></arr> 
  <arr name="affiliate"><str>amazon</str></arr> 
  <arr name="aka_title"><str>Men in Black II</str><str>MIB 2</str><str>MIIB</str> 
    <str>Men in Black 2</str><str>Men in black II (Hombres de negro II)</str><str>Hombres de negro II</str><str>Hommes en noir II</str></arr> 
  <bool name="blockbuster">false</bool> 
  <arr name="country"><str>US</str></arr> 
  <str name="description">Agent J (Will Smith) muss die Erde wieder vor einigem Abschaum bewahren, denn in Gestalt des verführerischen Dessous-Models Serleena (Lara Flynn Boyle) will ein Alien den Planeten unterjochen. Dabei benötigt J die Hilfe seines alten Partners Agent K (Tommy Lee Jones). Der wurde aber bei seiner "Entlassung" geblitzdingst, und so muß J seine Erinnerung erst mal etwas auffrischen bevor es auf die Jagd gehen kann.</str> 
  <arr name="director"><str>Barry Sonnenfeld</str></arr> 
  <int name="film_id">120912</int> 
  <arr name="genre"><str>Action</str><str>Komödie</str><str>Science Fiction</str></arr> 
  <str name="id">120912</str> 
  <str name="image_url">/media/search/filmcovers/105x/kf/false/F6Q1XW.jpg</str> 
  <int name="imdb_id">120912</int> 
  <date name="last_modified">2011-03-01T18:51:35.903Z</date> 
  <str name="locale_title">Men in Black II</str> 
  <int name="malus">3238</int> 
  <int name="parent_id">0</int> 
  <arr name="product_dvd"><str>amazon</str></arr> 
  <arr name="product_type"><str>dvd</str></arr> 
  <int name="rating">49</int> 
  <str name="sort_title">meninblack</str> 
  <int name="type">1</int> 
  <str name="url">/film/Men-in-Black-II-Barry-Sonnenfeld-Tommy-Lee-Jones-F6Q1XW/</str> 
  <int name="year">2002</int> 
 </doc> 

大多数查询是在具有某些过滤器的actor字段上的完全匹配查询.

Most queries are exact match queries on actor fields with some filters in place.

示例:

信息:[] webapp =/solr路径=/select/ params = {facet = true& sort =分数+ asc,+ malus + asc,+ year + desc& hl.simple.pre = starthl & hl = true& version = 2.2& fl = *,score& facet.query = year:[1900 + TO + 1950]& facet.query = year:[1951 + TO + 1980]& facet.query = year:[1981 + TO + 1990]& facet .query = year:[1991 + TO + 2000]& facet.query = year:[2001 + TO + 2011]&bf = div(sub(10000,malus),100)^ 10& hl.simple.post = endhl & facet.field = genre& facet.field = country& facet.field = blockbuster& facet.field = affiliate& facet.field = product_type& qs = 5& qt = dismax& hl .fragsize = 200& mm = 2& facet.mincount = 1& qf = actor ^ 0.1& f.blockbuster.facet.mincount = 0& f.genre.facet.limit = 20& hl.fl = actor& wt = json& f.affiliate.facet.mincount = 1& f.country.facet.limit = 20& rows = 10& pf = actor ^ 5& start = 0& q ="Josi + Kleinpeter"& ps = 3} hits = 1 status = 0 QTime = 4

INFO: [] webapp=/solr path=/select/ params={facet=true&sort=score+asc,+malus+asc,+year+desc&hl.simple.pre=starthl&hl=true&version=2.2&fl=*,score&facet.query=year:[1900+TO+1950]&facet.query=year:[1951+TO+1980]&facet.query=year:[1981+TO+1990]&facet.query=year:[1991+TO+2000]&facet.query=year:[2001+TO+2011]&bf=div(sub(10000,malus),100)^10&hl.simple.post=endhl&facet.field=genre&facet.field=country&facet.field=blockbuster&facet.field=affiliate&facet.field=product_type&qs=5&qt=dismax&hl.fragsize=200&mm=2&facet.mincount=1&qf=actor^0.1&f.blockbuster.facet.mincount=0&f.genre.facet.limit=20&hl.fl=actor&wt=json&f.affiliate.facet.mincount=1&f.country.facet.limit=20&rows=10&pf=actor^5&start=0&q="Josi+Kleinpeter"&ps=3} hits=1 status=0 QTime=4

推荐答案

有两种类型的升温.查询缓存预热和文档缓存预热(也有过滤器,但它们与查询相似).可以通过设置完成查询缓存预热,该设置将在重新加载索引之前重新运行X个最近的查询.文档缓存预热是不同的.

There are 2 types of warming. Query cache warming and document cache warming (There's also filters, but those are similar to queries). Query cache warming can be done through a setting which will just re-run X number of recent queries before the index was reloaded. Document cache warming is different.

文档高速缓存预热的目的是将大量最常访问的文档放入文档高速缓存中,从而不必从磁盘读取它们.因此,您的查询应专注于此.您需要尝试找出最常搜索的文档是什么,然后加载它们.优选地,具有最少数量的查询.这与字段的实际内容无关.澄清.在预热文档缓存时,您最关心的是无论查询方式如何,最常出现在搜索结果中的文档.

The goal of document cache warming is to get a large quantity of your most frequently accessed documents into the document caches so they don't have to be read from disk. So, your queries should focus on this. You need to try and figure out what your most frequently searched documents are and load those. Preferably with a minimal number of queries. This has nothing to do with the actual content of the fields. To clarify. When warming document caches your primary interest is the documents that turn up in search RESULTS most often, regardless of how they are queried.

我个人会搜索以下内容:

Personally, I'd run searches for things like:

  • 如果大多数搜索是针对美国电影的,则按国家/地区加载.
  • 如果大多数搜索是针对最近拍摄的电影,则按年份加载.
  • 如果您要搜索的类型很短,请按类型进行加载.

最后一种可能性是全部加载它们.您的文件看起来很小.如今,其中有70,000个服务器内存已不算什么.如果您的文档缓存足够大,并且有足够的可用内存,请继续使用它.附带一提,您最大的好处就是文档缓存.查询缓存仅对重复查询有益,后者可能令人失望地低.您几乎总是会从大型文档缓存中受益.

A last possibility is to load them all. Your documents look small. 70,000 of them is nothing in terms of server memory nowadays. If your document cache is large enough, and you have enough memory available, go for it. As a side note, some of your biggest benefit will be from your document cache. A query cache is only beneficial for repeated queries, which can be disappointingly low. You almost always benefit from a large document cache.

这篇关于什么使Solr中的自动预热查询良好,它们如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆