“考虑包括”如何功能在Gmail中实现? [英] How is the "Consider including" feature in Gmail implemented?

查看:164
本文介绍了“考虑包括”如何功能在Gmail中实现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做类似于gmail的类似于我的博客上的建议,但使用标签。

I would like to do something similar to gmail's "consider including" suggestions on my blog, but with tags.

我在想存储这样的标签集:

I was thinking of storing tags sets like this :

我想到了以下算法:

//a blog post is published
//it has the tags "A", "B" & "C" :
if the tag set "A,B,C" doesn't exist
   create it
else
   add 1 to "number of times used"

,并建议标签:

//a blog post is being written.
//the author includes the tags "A" and "C"
//which tags should I suggest ?
find all the tags sets that contain "A" and "C"
  among them, find the one with the highest "number of times used"
    suggest the tags of the set not already picked (A & C)

有没有更好/更聪明的方式来完成这项任务?数据库模型怎么样?可以进行优化,以便像包含A& C的搜索不会太慢?

Is there a better/smarter way of accomplishing this task ? What about the database model ? Can I optimize it so that searches like "sets that contain A & C" won't be too slow ?

推荐答案

搜索模式问题:

您的模型似乎对我来说过于简化,因为非常频繁的标签最有可能始终是建议的标签,即使有一些与A,C相关的标签更多。

Search model issues:
Your model seems a bit too simplified to me, since very frequent tags are most likely to always be the suggested ones, even if there are tags more related to the pair A,C.

你可能应该把 tf-idf 模型,这可以提升罕见的条件,如果它们也连接到查询[在这里查询是 A和B ],因为如果一个罕见的术语通常与 A和B 一起使用 - 这可能与他们非常相关。

You probably should concider the tf-idf model, which gives a boost to rare terms, if they are also connected to the "query" [in here the query is A and B], since if a rare term is commonly used with A and B - it is probably very much related to them.

这个想法很简单:如果一个标签经常用于 A和B - 给它一个提振。 [tf]

另外,如果一个术语很少[这个标签的总使用次数] - 给它一个提升[idf]

每个标签的分数将是组合的tf-idf分数

The idea is simple: If a tag is frequently used with A and B - give it a boost. [tf]
Also, if a term is rare [number of total uses of this tag] - give it a boost [idf]
The "score" for each tag will be the combined tf-idf score

性能问题:

您也可能会为此任务创建一个反向索引 - 加快搜索速度。

如果您使用的是Java, apache lucene 是一个成熟的图书馆,可以帮助您它。

Performance issues:
You might also concider for this task creating an inverted index - to speed up searches.
If you are using java, apache lucene is a mature library that can help you with it.

这篇关于“考虑包括”如何功能在Gmail中实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆