使用MySQL中的登录时间戳计算活跃用户 [英] Count active users using login timestamp in MySQL

查看:136
本文介绍了使用MySQL中的登录时间戳计算活跃用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在准备面试时,我遇到了一个SQL问题,希望对如何更好地回答它有所了解.

While preparing for an interview, I have come across an SQL question and I hope to get some insight as to how to better answer it.

给出时间戳记,用户ID,如何确定一周中每天活跃的用户数量?

Given timestamps, userid, how to determine the number of users who are active everyday in a week?

几乎没有什么,但这是我面前的问题.

There's very little to it, but that's the question in front of me.

推荐答案

我将基于对我来说最有意义的内容以及如果问题与此处相同时的答复方式来演示这种想法:

I'm going to demonstrate such an idea based on what makes most sense to me and the way I would reply if the question was presented same as here:

首先,我们假设一个数据集是这样,我们将表命名为logins:

First, let's assume a data set as such, we will name the table logins:

+---------+---------------------+
| user_id |   login_timestamp   |
+---------+---------------------+
|       1 | 2015-09-29 14:05:05 |
|       2 | 2015-09-29 14:05:08 |
|       1 | 2015-09-29 14:05:12 |
|       4 | 2015-09-22 14:05:18 |
|   ...   |          ...        |
+---------+---------------------+

可能还有其他列,但我们不介意.

There may be other columns, but we don't mind those.

首先,我们应该确定该周的边界,为此,我们可以使用ADDDATE().再加上今天今天是星期几(MySQL的DAYOFWEEK())就是星期天的想法.

First of all we should determine the borders of that week, for that we can use ADDDATE(). Combined with the idea that today's date-today's week-day (MySQL's DAYOFWEEK()), is sunday's date.

例如:如果今天是星期三10号,那么Wed - 3 = Sun,因此是10 - 3 = 7,我们可以预期星期日是7号.

For instance: If today is Wednesday the 10th, Wed - 3 = Sun, thus 10 - 3 = 7, and we can expect Sunday to be the 7th.

我们可以通过以下方式获取WeekStartWeekEnd时间戳:

We can get WeekStart and WeekEnd timestamps this way:

SELECT
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") WeekStart, 
DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7-DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59") WeekEnd;

注意:在PostgreSQL中有一个DATE_TRUNC()函数,该函数返回给定日期(例如星期开始,月份,小时等)的指定时间单位的开始.但这在MySQL中不可用.

Note: in PostgreSQL there's a DATE_TRUNC() function which returns the beginning of a specified time unit, given a date, such as week start, month, hour, and so on. But that's not available in MySQL.

接下来,让我们利用WeekStart和weekEnd来整理数据集,在本示例中,我将展示如何使用硬编码日期进行过滤:

Next, let's utilize WeekStart and weekEnd in order to clice our data set, in this example I'll just show how to filter, using hard coded dates:

SELECT *
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'

这应该返回切片的数据集,并且只包含相关结果:

This should return our data set sliced, with only relevant results:

+---------+---------------------+
| user_id |   login_timestamp   |
+---------+---------------------+
|       2 | 2015-09-29 14:05:08 |
|       1 | 2015-09-29 14:05:12 |
+---------+---------------------+

然后,我们可以将结果集减少为user_id,并过滤出重复项.然后以这种方式计数:

We can then reduce our result set to only the user_ids, and filter out duplicates. then count, this way:

SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp BETWEEN '2015-09-29 14:05:07' AND '2015-09-29 14:05:13'

DISTINCT将过滤出重复项,而count将仅返回金额.

DISTINCT will filter out duplicates, and count will return just the amount.

结合起来,变成:

SELECT COUNT(DISTINCT user_id)
FROM `logins`
WHERE login_timestamp 
    BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") 
        AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")

用任何时间戳替换CURDATE()以获得该周的用户登录计数.

Replace CURDATE() with any timestamp in order to get that week's user login count.

但是我需要把它分解成几天,我听到你哭了.当然!这是这样的:

But I need to break this down to days, I hear you cry. Of course! and this is how:

首先,让我们将内容丰富的时间戳转换为仅日期数据.我们添加DISTINCT是因为我们不介意同一天同一天两次登录.我们计算的是用户,而不是登录数,对吗? (请注意,我们退回到此处):

First, let's translate our over-informative timestamps to just the date data. We add DISTINCT because we don't mind the same user logging in twice the same day. we count users, not logins, right? (note we step back here):

SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d")
FROM `logins`

这将产生:

+---------+-----------------+
| user_id | login_timestamp |
+---------+-----------------+
|       1 | 2015-09-29      |
|       2 | 2015-09-29      |
|       4 | 2015-09-22      |
|   ...   |        ...      |
+---------+-----------------+

此查询,我们将用一秒换行,以计算每个日期的出现次数:

This query, we will wrap with a second, in order to count appearances of every date:

SELECT `login_timestamp`, count(*) AS 'count'
FROM (SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp` FROM `logins`) `loginsMod`
GROUP BY `login_timestamp`

我们使用count和分组来按日期获取列表,该列表返回:

We use count and a grouping in order to get the list by date, which returns:

+-----------------+-------+
| login_timestamp | count |
+-----------------+-------+
| 2015-09-29      | 1     +
| 2015-09-22      | 2     +
+-----------------+-------+


经过所有艰苦的努力,两者结合在一起:


And after all the hard work, both combined:

SELECT `login_timestamp`, COUNT(*)
FROM (
SELECT DISTINCT user_id, DATE_FORMAT(login_timestamp, "%Y-%m-%d") AS `login_timestamp`
FROM `logins`
WHERE login_timestamp BETWEEN DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 1- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 00:00:00") AND DATE_FORMAT(ADDDATE(CURDATE(), INTERVAL 7- DAYOFWEEK(CURDATE()) DAY), "%Y-%m-%d 23:59:59")) `loginsMod`
GROUP BY `login_timestamp`;

将在本周每天为您提供每天的登录细目.再次,替换CURDATE()以获得不同的星期.

Will give you a daily breakdown of logins per-day in this week. Again, replace CURDATE() to get a different week.

对于登录的用户本身,让我们以不同的顺序组合相同的内容:

As for the users themselves who logged in, let's combine the same stuff in a different order:

SELECT `user_id`
FROM (
    SELECT `user_id`, COUNT(*) AS `login_count`
    FROM (
        SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
        FROM `logins`) `logins`
    GROUP BY `user_id`) `logincounts`
WHERE `login_count` > 6

我有两个内部查询,第一个是logins:

I have two inner queries, the first is logins:

SELECT DISTINCT `user_id`, DATE_FORMAT(`login_timestamp`, "%Y-%m-%d")
FROM `logins`

将提供用户列表以及他们登录的日期,不能重复.

Will provide the list of users, and the days when they logged in on, without duplicates.

然后我们有logincounts:

SELECT `user_id`, COUNT(*) AS `login_count`
FROM `logins` -- See previous subquery.
GROUP BY `user_id`) `logincounts`

将返回相同的列表,其中包括每个用户的登录次数.

Will return the same list, with a count of how many logins each user had.

最后: 选择user_id FROM logincounts-请参阅上一个子查询. login_count> 6

And lastly: SELECT user_id FROM logincounts -- See previous subquery. WHERE login_count > 6

过滤掉7次未登录的用户,并删除日期列.

Filtering our those who didn't login 7 times, and dropping the date column.

这有点长,但是我认为它充满了想法,并且我认为这肯定可以在工作面试中以一种有趣的方式回答问题. :)

This kinda got long, but I think it's rife with ideas and I think it may definitely help answering in an interesting way in a work interview. :)

这篇关于使用MySQL中的登录时间戳计算活跃用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆