使用GAE数据存储区的复杂查询 [英] Complex Queries using GAE datastore

本文介绍了使用GAE数据存储区的复杂查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正处于开发体育统计网站(终极飞盘)的早期阶段,并希望了解您的意见,如果Google App Engine适合我。

I am in the early stages of developing a sports statistics website (ultimate frisbee) and would like to know your opinions if Google App Engine is right for me.

我使用Django在Python中编写它,并且已经很熟悉标准的RDBMS多年,但这个网站是一个长期的项目,我期待大量的数据,所以我想要GAE数据存储提供的无限缩放。对数据库的绝大多数查询将返回非常标准的结果,这将使数据存储看起来像一个合理的选择。然而,我希望能够在未来做出非常复杂的查询,以提出新的统计指标,或者只是提出有趣的结果。我计划在未来做很多事情,但是不知道这些查询是否已经被收集。

I am writing it in Python using Django and have been comfortable with standard RDBMS for years but this site is a long term project and I am expecting very large amounts of data so I would like the "infinite" scaling that the GAE datastore offers. A vast majority of the queries to the database will return very standard results that would make the datastore seem like a logical choice. However, I would like to be able to make extremely complex queries in the future to come up with new statistical metrics or simply come up with interesting results. I plan on doing a lot of this in the future, but won't know what these queries are until the data is already collected.

例如,你经常看到棒球统计分析师们提出了可笑的统计数据,如这只是过去50年来两位左手投手,他们的名字以Z开头的投篮命中率首次在后天抛出一次。我希望在将来有任何疑问的灵活性。 :)

For instance, you often see baseball stats analysts come up with ridiculous stats like "This is only the first time in the past 50 years that two left handed pitchers whose last names start with 'Z' have thrown one-hit shutouts in back to back days". I would like to have the flexibility of making any queries whatsoever in the future. :)

但是,我的印象是,像bigtable这样的非关系数据库要求您提前提供包含冗余数据的模型,并且所有的工作都会发生在插入而不是抓取。我已经建立了django模型,其中包含几乎所有需要查询的数据,但是我不知道从现在开始,我想要一两年的时间是多少。因此,我希望将来在GAE数据存储区中进行复杂的查询是非常困难的,并且需要我在python中处理之前将大量的信息从服务器中提取出来。

However, I am under the impression that a non-relational database like bigtable requires you to come up with models containing redundant data beforehand and all of the work takes place on the inserts rather than the fetches. I've already built django models that would contain virtually all the data I would ever need to query on, but I have no idea what denormalized models I'll want to have a year or two from now. Thus, I feel like making complex queries in the future would be extremely difficult on the GAE datastore and would require me to pull a ton of information off the server before processing it in python.

Google应用引擎数据存储区对于我想要做什么是错误的?或者只是错过了一些东西。非常感谢!

Is the google app engine datastore simply wrong for what I want to do? Or am just missing something. Thanks so much in advance!

更新:
感谢您的回复。我意识到我也应该提到,很多这些复杂的查询是我希望用户能够做的查询,因此使离线数据库不是一个真正的选择。例如,用户应该能够看到各种统计数据,当特定玩家在特定的游戏或季节的同时,他们在场上时,两个特定玩家的表现如何。虽然这些查询并不像标准聚合统计数据那样频繁,但是它们仍然会发生规律性。

Update: Thanks for the responses so far. I realize that I should also mention that a lot of these complex queries are queries that I would like the users to be able to do, thus making an offline database not really an option. For instance, users should be able to see various statistics of how well any two particular players play when they are on the field at the same time during specific games or seasons. While these queries aren't nearly as frequent as standard aggregate stats, they will still happen with regularity.

拥有关系数据库以及GAE数据存储将是巨大的,但是django默认情况下不支持多个db,并且一起解决一个解决方案听起来很困难和凌乱。 Eric Florenzano有两个数据库的不错的解决方案都使用django模型,但如果我使用GAE数据存储区,我将不得不使用应用程序引擎的数据库模型。而对于这个复杂的问题,他提出了一个很好的解决方案,这是有点超出了我的技能水平。

Having a relational database as well as the GAE datastore would be great, but django doesn't support multiple db's by default yet and patching a solution together sounds difficult and messy. Eric Florenzano has a nice solution for two databases that both use the django models, but if I were to use the GAE datastore, I would have to use the app engine's db model instead. And coming up with a nice solution like he did for this complex problem is a bit beyond my skill level at this point.

现在我最喜欢的两个选项是使用GAE任务队列可以进行困难的查询,或者进行像Webfaction这样的更为标准的webhost,然后只要我的数据不断增长,我就需要对表进行非规范化处理,并且我需要提高性能。

Right now my favorite two options are using the GAE Task Queue to do the difficult queries or going to a more standard webhost like webfaction and then just denormalize my tables later once my data grows and I need to increase performance.

推荐答案

您所描述的内容基本上是 OLAP - 在线分析处理。 OLAP是传统RDBMSs非常擅长的一部分,部分是由于SQL的灵活性和强大功能以及非关系数据库(如App Engine数据存储)并不是这样。这听起来像你的OLAP类型的查询相对来说比普通访问相对不常见,所以我建议两种方法之一:

What you're describing is essentially OLAP - Online Analytical Processing. OLAP is one thing that 'traditional' RDBMSes are very good at, in part due to the flexibility and power of SQL - and non-relational databases such as the App Engine datastore aren't. It sounds like your OLAP-type queries will be relatively infrequent compared to normal access, though, so I'd suggest one of two approaches:


  • 将您的App Engine数据存储区中的所有数据从间隔镜像到关系数据库,并对关系数据库执行分析查询。面向用户的流量仍然由数据存储区提供,所以您可以获得所有优势,但您可以使用离线副本对您进行复杂的查询。

  • 使用App Engine的任务队列支持执行检查大型数据集的查询。您可以使用Python或Java编写查询,然后使用任务队列在非常大的数据集中执行查询,并在完成后异步地获取结果。显然,需要一些基础设施工作才能使之变得轻而易举(尽管请留意我的博客,为未来的项目涉及此;)。

  • Mirror all your data from your App Engine datastore to a relational database at intervals, and perform the analytical queries on the relational database. User-facing traffic is still served by the datastore, so you get all the advantages of that, but you have an offline copy you can do complex queries against.
  • Use App Engine's Task Queue support to execute queries that examine large datasets. You can write your query in Python or Java, then use the Task Queue to execute it across a very large dataset, and pick up the results asynchronously, when they're done. Obviously there's a bit of infrastructure work required to make this easy (though keep an eye on my blog for a future project involving this ;).

这篇关于使用GAE数据存储区的复杂查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆