PostgreSQL-获取统计数据 [英] PostgreSQL - Getting statistical data

查看:117
本文介绍了PostgreSQL-获取统计数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在我的应用程序中收集一些统计信息。
我有一个用户表(tb_user)
每次新用户访问应用程序时,它都会在此表中添加一条新记录,即每个用户一行。主要字段是 id date_hour (用户首次访问该应用程序的时间戳)。

I need to collect some statistical information in my application. I have a table of users (tb_user) Every time a new user accesses the application, it adds a new record in this table, ie, one line for each user. The main field are id and date_hour (timestamp for the first time user accessed the application).

tb_user

id (bigint) | date_time (timestamp with time zone)
 1          |  2012-01-29 11:29:50.359-03
 2          |  2012-01-31 14:27:10.359-03

我需要获取:

每天,每周和每月的平均用户量

例如:

每天:55.45

每周:XX.XX

月:XX.XX

编辑:

我最好的解决方案是:

WITH daily_count AS (SELECT COUNT(id) AS user_count FROM tb_user)
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) FROM daily_count, 
    (SELECT EXTRACT(DAY FROM (t2.diff) ) + 1 AS days
     FROM
       (with tbaux AS(SELECT  min(date_time) AS min FROM tb_user)
       SELECT (now() - min) AS diff
       FROM tbaux) AS t2) AS tbaux2
GROUP BY user_count, tbaux2.days

但是此解决方案仅适用于EXTRACT(DAY ...连星期和一个月都不起作用

But this solution only worked with EXTRACT (DAY ... With weeks and month did not work

欢迎任何帮助。

A或者:

SELECT user_count, tbaux2.days, (user_count/tbaux2.days) AS userPerDay, ((user_count/tbaux2.days) * 7) AS userPerWeek, ((user_count/tbaux2.days) * 30) AS userPerMonth

编辑2:

根据@Bruno的回复,有一些注意事项:

Based on responses from @Bruno, there are some considerations:

当我问这个问题时,实际上是要求一种按日,月和年选择数据的方法。我认为我发布并@Bruno完善的搜索结果应解释为平均每天,每7天和每30天 ,而不是按天,周和月。我相信,如果以这种方式进行解释,示例中不会出现性别引用的问题(下降10%)。我认为此每一个方法都是我需要的答案,因此请在此答案上签名。

When I asked the question, in really I requested a way to select data by day, month and year. I believe that the search that I posted and @Bruno refined, should be interpreted as average of "a day, every 7 days and every 30 days" and not by days, weeks and months. I believe that if it is interpreted in this way, there not will be problems of gender-quoted in example (10% drop). I believe this approach of "every" is answer I need in moment, so will sign this answer.

我建议对帖子进行改进:

I suggest as an improvement of post:


  • 仅考虑结果的闭市日(不收集当日的用户,也不计算除法中的当日)

  • 结果是两个数字。

  • 新研究考虑了每周和每月的数据。

谢谢。

推荐答案

您应该查看集合函数(最小,最大,计数,平均),它们与 GROUP BY 。对于基于日期的汇总, date_trunc 也是有用的。

You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.

例如,这将返回每天的行数:

For example, this will return the number of rows per day:

SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count FROM tb_user
    GROUP BY date_trunc('day', date_time);

然后您可以使用类似方法(使用CTE ):

You can then do the daily average using something like this (with a CTE):

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count FROM tb_user
    GROUP BY date_trunc('day', date_time))
SELECT AVG(user_count) FROM daily_count;

使用'week'代替每周计数,依此类推(请参见 date_trunc 文档)。

Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).

编辑: (以下评论:2012年5月1日及之前的平均值,即6日之前。)

(Following comment: average up to and including 5/1/2012, i.e. before the 6th.)

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
       COUNT(id) AS user_count
    FROM tb_user
       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06') 
    GROUP BY date_trunc('day', date_time))
SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;

在这种情况下,以上内容过于复杂。这应该给您相同的结果:

What's above is over-complicated, in this case. This should give you the same result:

SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))
    FROM tb_user
       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');

编辑2:编辑后,我想你要只是整个数据库存在期间的全球平均值,而不是按月/周/日分组。

EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.

这应该为您提供平均行数每天:

This should give you the average number of rows per day:

WITH total_min_max AS (SELECT
        COUNT(id) AS total_visits,
        MIN(date_time) AS first_date_time,
        MAX(date_time) AS last_date_time,
    FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
    FROM total_min_max

(我将 last_date_time 替换为 NOW()表示直到现在,而不是最后一次访问的时间,如果没有最近访问,则取平均值。)

(I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)

然后,每天,每周和每月:

Then, for daily, weekly, and "monthly":

WITH daily_avg AS (
    WITH total_min_max AS (SELECT
            COUNT(id) AS total_visits,
            MIN(date_time) AS first_date_time,
            MAX(date_time) AS last_date_time,
        FROM tb_user)
    SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
        FROM total_min_max)
SELECT
         users_per_day,
         (users_per_day * 7) AS users_per_week,
         (users_per_month * 30) AS users_per_month
    FROM daily_avg

话虽如此,您从此类统计数据中得出的结论可能不是很好,尤其是如果您想了解它如何变化。

This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.

我还将每天标准化数据,而不是假设一个月中有30天(如果不是每小时,因为并非全天都有24小时)。假设您在2011年1月每天有10次访问,在2011年2月每天有10次访问。也就是说,您1月有310次访问,2月有280次访问。如果不注意,您可能会认为自己已经访客数量下降了10%,因此2月份出了点问题,实际上并非如此。

I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.

这篇关于PostgreSQL-获取统计数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆