比较谷歌应用程序引擎数据存储中的多个日期范围(多对多,Python) [英] Compare many date ranges in google app engine datastore (Many to many, Python)

查看:122
本文介绍了比较谷歌应用程序引擎数据存储中的多个日期范围(多对多,Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  class First_Set(db.Model):
在Google App Engine数据存储中有两个数据集。 start_time = db.DateTimeProperty()
end_time = db.DateTimeProperty()
data1 = db.FloatProperty()
...

class Second_Set(db.Model ):
start_time = db.DateTimeProperty()
end_time = db.DateTimeProperty()
data2 = db.FloatProperty()
...


$ b

(他们有其他不同的数据,这就是为什么他们在不同的数据集中。)

p>我希望在两个数据集之间找到所有重叠的start_time和end_time的数据存储区标识符,理想情况下不需要从一个数据集中获取结果并将第一个结果迭代到另一个数据集中。



初始数据集的一个很好的可视化来自 here (它也有SQL中解决的问题):

  1 | ----- | 
2 | ----- |
3 | - |
4 | ----- |
5 | ----- |
6 | --- |
7 | --- |
8 | --- |
9 | ----- |

我需要的最终结果是(来自同一个示例):

  + ---- + --------------------- + ---- + ------------- -------- + 
| id |开始| id |结束|
+ ---- + --------------------- + ---- + ------------- -------- +
| 2 | 2008-09-01 15:02:00 | 1 | 2008-09-01 15:04:00 |
| 5 | 2008-09-01 16:19:00 | 4 | 2008-09-01 16:23:00 |
| 8 | 2008-09-01 16:20:00 | 4 | 2008-09-01 16:22:00 |
| 8 | 2008-09-01 16:20:00 | 5 | 2008-09-01 16:22:00 |
| 7 | 2008-09-01 18:18:00 | 9 | 2008-09-01 18:22:00 |
+ ---- + --------------------- + ---- + ------------- -------- +

下面的例子描述了SQL解决方案,但我不能由于缺少JOIN,不会在数据存储区执行此操作:

  SELECT v1.id,v1.start,v2.id,LEAST (v1.end,v2.end)AS结束
FROM访问v1
JOIN访问v2 ON v1.id<> v2.id和v1.start> = v2.start和v1.start< v2.end
ORDER BY v1.start;

据我所知,使用ListProperty()(from 这个问题)。



任何人都可以想出一个解决方案来找到重叠时间(理想情况下用Python)吗?

解决方案

谢谢Shay的指导,发布我的解决方案时不加入JOIN。应该能够通过少量编辑查找任意数量的数据集的重叠(至少这就是理论)。



我的Python并不好,但下面应该给出这个想法:

 来自操作员导入itemgetter 

class Find_Overlaps(webapp2.RequestHandler):
def get(self):
all_dates = []
first_dates = db.GqlQuery(SELECT * FROM First_Set)
for first_dates中的日期:
row = {'dataset': 'First_Set','dbkey':date.key(),'offset':date.start_time,'type':-1}
all_dates.append(row)
row = {'dataset': 'First_Set','dbkey':date.key(),'offset':date.end_time,'type':1}
all_dates.append(row)

second_dates = db。 gqlQuery(SELECT * FROM Second_Set)
for second_dates中的日期:
row = {'dataset':'Second_Set','dbkey':date.key(),'offset':date.start _time,'type':-1}
all_dates.append(row)
row = {'dataset':'Second_Set','dbkey':date.key(),'offset':date。 end_time,'type':1}
all_dates.append(row)
$ b $ newlist = sorted(all_dates,key = itemgetter('offset','type'))
number_datasets = 2#goal是在所有组中找到重叠,不仅是最好的重叠,这就是为什么这是需要的
loopcnt = 0
update_bestend = 0
overlaps = []
for row在newlist中:#Below主要来自Marzullo的alghorithm
loopcnt = loopcnt - row ['type']#这是为了跟踪整体计数
如果update_bestend == 1:
if loopcnt = =(number_datasets - 1):
bestend = row ['offset']
end_set = row ['dataset']
end_key = row ['dbkey']
overlaps.append ({'STA rt':beststart,'start_set':start_set,'start_key':start_key,'end':bestend,'end_set':end_set,'end_key':end_key})
update_bestend = 0
if loopcnt = = number_datasets:
beststart = row ['offset']
start_set = row ['dataset']
start_key = row ['dbkey']
update_bestend = 1

用于重叠的重叠:#只需查看结果是什么
self.response.out.write('start:%s,start_set:%s,end:%s,end_set:%s< br''%(overlap ['start'],overlap ['start_set'],overlap ['end'],overlap ['end_set']))


I have two data sets in google app engine datastore.

class First_Set(db.Model):
  start_time = db.DateTimeProperty()
  end_time = db.DateTimeProperty()
  data1 = db.FloatProperty()
  ...

class Second_Set(db.Model):
  start_time = db.DateTimeProperty()
  end_time = db.DateTimeProperty()
  data2 = db.FloatProperty()
  ...

(They have other different data that's why they're in different datasets.)

I'd like to find the datastore IDs all the overlapping start_time and end_time across two datasets, ideally without pulling results from one and iterating the first results over the other.

A great visualization of the initial dataset is from here (it also has the problem solved in SQL):

1     |-----| 
2        |-----| 
3                 |--| 
4                       |-----| 
5                          |-----| 
6                                  |---| 
7                                        |---|  
8                           |---| 
9                                       |-----|

End result I need is something in the tune of (from the same example):

+----+---------------------+----+---------------------+ 
| id | start               | id | end                 | 
+----+---------------------+----+---------------------+ 
|  2 | 2008-09-01 15:02:00 |  1 | 2008-09-01 15:04:00 | 
|  5 | 2008-09-01 16:19:00 |  4 | 2008-09-01 16:23:00 | 
|  8 | 2008-09-01 16:20:00 |  4 | 2008-09-01 16:22:00 | 
|  8 | 2008-09-01 16:20:00 |  5 | 2008-09-01 16:22:00 | 
|  7 | 2008-09-01 18:18:00 |  9 | 2008-09-01 18:22:00 | 
+----+---------------------+----+---------------------+ 

SQL solution is described in the example as below but I couldn't do this in datastore because of lack of JOIN:

SELECT v1.id, v1.start, v2.id, LEAST(v1.end,v2.end) AS end 
FROM visits v1 
JOIN visits v2 ON v1.id <> v2.id and v1.start >= v2.start and v1.start < v2.end  
ORDER BY v1.start;

I understand that one-to-many version of this is rather straightforward using a ListProperty() (from this question).

Can anyone think of a solution to find the overlapping times (ideally in Python)?

解决方案

Posting my solution with no JOINs, thanks to Shay's direction. Should be able to find overlaps over any number of datasets with minor edits (at least that's the theory).

My Python isn't that great but below should give the idea:

from operator import itemgetter

class Find_Overlaps(webapp2.RequestHandler):
    def get(self):
        all_dates = []
        first_dates = db.GqlQuery("SELECT * FROM First_Set")
        for date in first_dates:
            row = {'dataset':'First_Set', 'dbkey':date.key(), 'offset':date.start_time, 'type': -1}
            all_dates.append(row)
            row = {'dataset':'First_Set', 'dbkey':date.key(), 'offset':date.end_time, 'type': 1}
            all_dates.append(row)

        second_dates = db.GqlQuery("SELECT * FROM Second_Set")
        for date in second_dates:
            row = {'dataset':'Second_Set', 'dbkey':date.key(), 'offset':date.start_time, 'type': -1}
            all_dates.append(row)
            row = {'dataset':'Second_Set', 'dbkey':date.key(), 'offset':date.end_time, 'type': 1}
            all_dates.append(row)

        newlist = sorted(all_dates, key=itemgetter('offset','type'))
        number_datasets = 2 #goal is to find overlaps in all sets not only the best overlaps, that's why this is needed
        loopcnt = 0
        update_bestend = 0
        overlaps = []
        for row in newlist: #Below is mostly from Marzullo's alghorithm
            loopcnt = loopcnt - row['type']#this is to keep track of overall tally
            if update_bestend == 1:
                if loopcnt == (number_datasets - 1):
                    bestend = row['offset']
                    end_set = row['dataset']
                    end_key = row['dbkey']
                    overlaps.append({'start':beststart,'start_set':start_set,'start_key':start_key,'end':bestend,'end_set':end_set,'end_key':end_key})
                    update_bestend = 0
            if loopcnt == number_datasets:
                beststart = row['offset']
                start_set = row['dataset']
                start_key = row['dbkey']
                update_bestend = 1

        for overlap in overlaps: #just to see what the outcome is
            self.response.out.write('start: %s, start_set: %s, end: %s, end_set: %s<br>' % (overlap['start'], overlap['start_set'], overlap['end'], overlap['end_set']))

这篇关于比较谷歌应用程序引擎数据存储中的多个日期范围(多对多,Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆