为数据库中的行分配访问号? [英] Assigning a visit number to rows in a database?

查看:66
本文介绍了为数据库中的行分配访问号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试搜索时没有运气,也许我只是没有正确使用术语。无论如何,我正在使用postgresql 8.1数据库(redshift),其中的数据示例如下所示:

Tried searching around with no luck, maybe I'm just not using right the terminology. Anyway, I'm working with a postgresql 8.1 database (redshift) where a sample of the data would look like this:

User ID   Timestamp    Visit ID
1         05:20:20am   10
2         05:21:00am   11
1         05:22:14am   12
1         05:24:09am   13

假设这是数据的非常基本的样本。我想分析人们在第一次访问,第二次访问,第三次访问等时如何使用其他指标。我认为最简单的方法是创建一个单独的表,如下所示:

Let's say this is a very basic sample of the data. I want to do analysis on how people do on other metrics when its their 1st visit, 2nd visit, 3rd visit, etc. The easiest way I can think to do this would be to create either a seperate table that looks like:

Visit ID    Visit Number
10          1
11          1
12          2 
13          3 

或者以某种方式附加到我的原始表中:

Or somehow appending to my original table:

User ID   Timestamp    Visit ID    Visit Number
1         05:20:20am   10          1
2         05:21:00am   11          1 
1         05:22:14am   12          2
1         05:24:09am   13          3

我知道我可以使用groupby并计算访问次数至少为x的访客与访问次数较少的访客的分析表现。但是,我不知道如何在第一次,第二次,第三次访问中分析性能,尤其是当用户可能访问过1-5万次的情况下。

I know I can use groupby and count to anaylze performance for those visitors who came at least x amount of times vs those that came less. However, I don't know how to anaylze performance on the first, second, third visit, especially when users could have visited anywhere from 1-50,000 times.

我想回答的示例问题...对于访问次数至少10次的用户,他们在前5次访问还是在第二次5次访问中表现更好?

So an example question I'd like to answer... for users who visit at least 10 times, do they perform better on their first 5 visits or their 2nd five?

非常

谢谢。

推荐答案

一个(可能很慢)选项是使用相关子查询:

One (potentially slow) option is to use a correlated subquery:

SELECT  t.UserID, 
        t.Timestamp, 
        t.VisitID,
        (   SELECT  COUNT(*) + 1
            FROM    T T2
            WHERE   T2.UserID = T.UserID
            AND     T2.TimeStamp < T.TimeStamp
        ) VisitNumber
FROM    T;

SQL小提琴上的示例

Example on SQL Fiddle

最简单(尽管可能不可行)的解决方案是升级到Postgresql 8.4或更高版本,并利用ROW_NUMBER:

The easiest (although possibly not viable) solution would be to upgrade to Postgresql 8.4 or later and take advantage of ROW_NUMBER:

SELECT  t.UserID, 
        t.Timestamp, 
        t.VisitID,
        ROW_NUMBER() OVER(PARTITION BY t.UserID ORDER BY t.Timestamp) AS VisitNumber
FROM    T;

关于SQL Fiddle的示例

Example on SQL Fiddle

最后,如果第一个查询太慢,并且升级不是一个选项,下面的文章详细介绍了另一个选择。我不知道该如何执行。

Finally, if the first query is too slow, and upgrading is not an option the following article goes into some detail about a further alternative. I have no idea how this will perform though.

http://www.depesz.com/2007/08/17/rownum-anyone-cumulative-sum-in-one-query/

这篇关于为数据库中的行分配访问号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆