如何提供多因素加权排序最相关的结果 [英] How to provide most relevant results with Multiple Factor Weighted Sorting

查看：1023 发布时间：2015/11/30 14:43:38 algorithm sorting bayesian relevance weighted-average

本文介绍了如何提供多因素加权排序最相关的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要提供一个加权排序上2+因素，下令相关性。然而，因素不完全隔离，在我想要一个影响紧迫性的因素或多个其它的（重量）

I need to provide a weighted sort on 2+ factors, ordered by "relevancy". However, the factors aren't completely isolated, in that I want one or more of the factors to affect the "urgency" (weight) of the others.

例如：所提供的内容（文章的），可以向上/向下投，因而有评级;他们有一个发布日期，而且他们也标记类别。用户写的文章，可以投票，并且可以或可以不具有某种排名本身（专家等）。大概类似于计算器，对吧？

Example: contributed content (articles) can be up-/down-voted, and thus have a rating; they have a post date, and they're also tagged with categories. Users write the articles and can vote, and may or may not have some kind of ranking themselves (expert, etc). Probably similar to StackOverflow, right?

我想提供与由标签分组的文章的列表中的每个用户，但依关联，其中的关联的是基于计算在制品的等级和年龄，以及可能受排名的作者。 I.E.这是写了几年前一个高排名的文章未必是有关为昨天的书面媒体排名的文章。也许如果一篇文章的作者是一个专家，将被视为超过一写乔Schmoe。

I want to provide each user with a list of articles grouped by tag but sorted by "relevancy", where relevancy is calculated based on the rating and age of the article, and possibly affected by the ranking of the author. I.E. a highly ranked article that was written several years ago may not necessarily be as relevant as a medium ranked article written yesterday. And maybe if an article was written by an expert it would be treated as more relevant than one written by "Joe Schmoe".

另一个很好的例子是<一个href="http://stackoverflow.com/questions/8661118/need-help-maximizing-3-factors-in-multiple-similar-objects-and-ordering-appropr/8759877#8759877"标题=最大化3因素和订购适当>指定酒店元分由价格，评级，和旅游景点。

我的问题是，什么是多因素排序最好的算法？这可能是<一个重复href="http://stackoverflow.com/questions/8661118/need-help-maximizing-3-factors-in-multiple-similar-objects-and-ordering-appropr/8759877#8759877"标题=最大化3因素和订购适当>这个问题，但我对任何数量的因素（一更合理的预期是2 - 4因素）感兴趣的一个通用的算法，preferably一个我没有扭捏或需要用户输入，而我无法解析线性代数和特征向量的古怪全自动功能。

My question is, what is the best algorithm for multiple factor sorting? This may be a duplicate of that question, but I'm interested in a generic algorithm for any number of factors (a more reasonable expectation is 2 - 4 factors), preferably a "fully-automatic" function that I don't have to tweak or require user input, and I can't parse linear algebra and eigenvector wackiness.

可能性，我发现迄今：

注：取值是排序分数的

Note: S is the "sorting score"

线性加权 - 使用功能，如： S =（W <子> 1 *˚F<子> 1 ）+ （W ₂ *˚F₂）+（W ₃ *˚F<子> 3 ） ，其中是W <子> X 是任意分配的权重，而 F <子> X 的因素的值。你也希望要正常化 F （即 F <子> x_n = F <子> X /˚F <子>最大 ）。我认为这是有点儿如何<一个href="http://stackoverflow.com/questions/817998/how-to-sort-search-results-on-multiple-fields-using-a-weighting-function"标题=问题重：Lucene的排序算法。> Lucene搜索工作
基地-N加权 - 更像是一个比加权分组，这其中权重不断增加以10的倍数（类似的原理的CSS选择特异性），让更多的重要的因素是显著高： S = 1000 *˚F<子> 1 + 100 *˚F₂ + 10 *˚F<子> 3 ... 。
估计真值（ETV） - 这显然是什么谷歌在他们的报告，介绍了分析，其中的一个因素影响的值（权重的）另一个因素 - 其结果是更多的统计显著值排序。链接解释它pretty的好，所以这里只是等式： S =（F ₂ /˚F<子> 2_max *˚F<子> 1 ）+（（1 - （F ₂ /˚F<子> 2_max ））*˚F<子> 1_avg ） ，其中 F ₁ 是更重要的因素（在文章跳出率），而 F <子> 2 是意义的修改的因素（在文章访问）。
Bayes估计 - 看起来真的很相似，恩替卡韦，这是怎么IMDB计算它们的评级。见的这个计算器职位说明;公式： S =（F ₂ /（F ₂ + F <子> 2_lim ））*˚F<子> 1 +（F <子> 2_lim /（F ₂ + F <子> 2_lim ））×F <子> 1_avg ，其中 F <子> X 是相同的＃3，和 F <子> 2_lim 是最低临界限值时，为意义的因素（即小于X的任何值不应该被考虑）。

"Linearly weighted" - use a function like: S = (w₁ * F₁) + (w₂ * F₂) + (w₃ * F₃), where w_x are arbitrarily assigned weights, and F_x are the values of the factors. You'd also want to normalize F (i.e. F_{x_n} = F_x / F_max). I think this is kinda how Lucene search works.
"Base-N weighted" - more like grouping than weighting, it's just a linear weighting where weights are increasing multiples of base-10 (a similar principle to CSS selector specificity), so that more important factors are significantly higher: S = 1000 * F₁ + 100 * F₂ + 10 * F₃ ....
Estimated True Value (ETV) - this is apparently what Google Analytics introduced in their reporting, where the value of one factor influences (weights) another factor - the consequence being to sort on more "statistically significant" values. The link explains it pretty well, so here's just the equation: S = (F₂ / F_{2_max} * F₁) + ((1 - (F₂ / F_{2_max})) * F_{1_avg}), where F₁ is the "more important" factor ("bounce rate" in the article), and F₂ is the "significance modifying" factor ("visits" in the article).
Bayesian Estimate - looks really similar to ETV, this is how IMDb calculates their rating. See this StackOverflow post for explanation; equation: S = (F₂ / (F₂+F_{2_lim})) * F₁ + (F_{2_lim} / (F₂+F_{2_lim})) × F_{1_avg}, where F_x are the same as #3, and F_{2_lim} is the minimum threshold limit for the "significance" factor (i.e. any value less than X shouldn't be considered).

选项＃3或＃4看起来真的有希望的，因为你真的没有选择任意的权重方案像你这样的＃1和＃2，但问题是你怎么做了两个多因素？

Options #3 or #4 look really promising, since you don't really have to choose an arbitrary weighting scheme like you do in #1 and #2, but the problem is how do you do this for more than two factors?

我也遇到了 SQL语句执行的双因素加权算法，这基本上是我需要最终写入。

I also came across the SQL implementation for a two-factor weighting algorithm, which is basically what I'll need to write eventually.

如何提供多因素加权排序最相关的结果 [英] How to provide most relevant results with Multiple Factor Weighted Sorting

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何提供多因素加权排序最相关的结果 [英] How to provide most relevant results with Multiple Factor Weighted Sorting

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭