如何是"考虑包括"在Gmail功能实现的? [英] How is the "Consider including" feature in Gmail implemented?

查看:131
本文介绍了如何是"考虑包括"在Gmail功能实现的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想这样做类似于Gmail的考虑,包括的建议对我的博客,但标签。

I would like to do something similar to gmail's "consider including" suggestions on my blog, but with tags.

我在想存储标签的设置是这样的:

I was thinking of storing tags sets like this :

和我想下面的算法:

//a blog post is published
//it has the tags "A", "B" & "C" :
if the tag set "A,B,C" doesn't exist
   create it
else
   add 1 to "number of times used"

和,建议标签:

//a blog post is being written.
//the author includes the tags "A" and "C"
//which tags should I suggest ?
find all the tags sets that contain "A" and "C"
  among them, find the one with the highest "number of times used"
    suggest the tags of the set not already picked (A & C)

有没有完成这个任务的更好/更聪明的方法?怎么样数据库模型?我可以优化它让喜欢的设置包含A和C的搜索?的也不会太慢

推荐答案

搜索模式的问题:
您的模式似乎有点太简单对我来说,因为非常频繁的标签是最有可能永远是建议的,即使有标签更关系到对A,C。

Search model issues:
Your model seems a bit too simplified to me, since very frequent tags are most likely to always be the suggested ones, even if there are tags more related to the pair A,C.

您也许应该海外商品会有 TF-IDF 的模式,这极大地促进罕见方面,如果它们也连接到查询[在这里查询 A和B ],因为如果一个难得的术语与 A和B - 它可能是有很大关系它们

You probably should concider the tf-idf model, which gives a boost to rare terms, if they are also connected to the "query" [in here the query is A and B], since if a rare term is commonly used with A and B - it is probably very much related to them.

我们的想法很简单:如果一个标签经常用于 A和B - 给它一个提升。 [TF]
另外,如果一个词是少见[这个标签的总使用次数] - 给它一个升压[IDF]
的得分每个标签将成为合并后的TF-IDF得分

The idea is simple: If a tag is frequently used with A and B - give it a boost. [tf]
Also, if a term is rare [number of total uses of this tag] - give it a boost [idf]
The "score" for each tag will be the combined tf-idf score

性能问题:
您可能还海外商品会有此任务创建倒排索引 - 以加快搜索。
如果您使用的是Java,的Apache Lucene的是一个成熟的库,可以帮助您吧。

Performance issues:
You might also concider for this task creating an inverted index - to speed up searches.
If you are using java, apache lucene is a mature library that can help you with it.

这篇关于如何是"考虑包括"在Gmail功能实现的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆