我如何解决Google App Engine的10GB限制? [英] How can I address the 10GB limit on Google App Engine?

查看:157
本文介绍了我如何解决Google App Engine的10GB限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们试图通过坐在GMail之上来搜索收件箱,并且正在使用App Engine搜索API,但我们正在达到10 GB的限制。这是因为我们正在索引整个组织的电子邮件,所以我们可以搜索整个团队的收件箱。我们如何解决这个问题?一种方法可能是每个人都有单独的索引,并以某种方式手动合并结果,但担心合并结果可能非常复杂!想知道哪些选项可用?

解决方案

这是任何文档检索系统中的典型问题,解决方案是将整个语料库分成多个桶。您应该根据您的要求/使用模式选择切片策略。



一种可能性是按照日期分割消息。您不断向索引添加消息,直到您接近极限,此时您将为较新的消息启动一个新索引。或者你可以按日历时间间隔(每年,每季度或每月,取决于你的音量)。



合并来自多个索引的结果很简单。您还可以让用户有机会选择他们想要搜索的时间。通常人们知道他们正在寻找最近发生的事情或很久以前发生的事情。

We are trying to index inboxes by sitting on top of the GMail, and are using the App Engine search API, but we are hitting up the 10 GB limit. This is because we are indexing the whole organization's emails so we can search across the whole team's inbox. How can we work around this? One way might be to have an individual index per person and somehow combine the results manually, but worried that merging results might be really complex! Wondering what options are available?

解决方案

This is a typical problem in any document retrieval system, and the solution is to slice the entire corpus into multiple buckets. You should choose a slicing strategy based on your requirements/usage pattern.

One possibility is to slice messages by their date. You keep adding messages to an index until you come close to the limit, at which point you start a new index for newer messages. Or you can do it by calendar intervals (per year, per quarter or per month, depending on your volume).

Merging results from several indexes is simple. You can also give users a chance to choose how far back in time they want to go in their search. Often people know that they are looking for something recent or something that happened a long time ago.

这篇关于我如何解决Google App Engine的10GB限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆