SQLite:如何选择“每个用户的最新记录"?从单个表与复合键? [英] SQLite: How to SELECT "most recent record for each user" from single table with composite key?

查看:85
本文介绍了SQLite:如何选择“每个用户的最新记录"?从单个表与复合键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是数据库专家,感觉好像我缺少一些核心SQL知识来寻求该问题的解决方案.这是我可以解释的简短情况.

I'm not a database guru and feel like I'm missing some core SQL knowledge to grok a solution to this problem. Here's the situation as briefly as I can explain it.

上下文:

我有一个包含时间戳的用户事件记录的SQLite数据库表.可以通过时间戳和用户ID(即事件发生的时间以及事件的发生者)的组合来唯一标识记录.我了解这种情况称为复合主键".该表看起来像这样(当然,删除了许多其他列):

I have a SQLite database table that contains timestamped user event records. The records can be uniquely identified by the combination of timestamp and user ID (i.e., when the event took place and who the event is about). I understand this situation is called a "composite primary key." The table looks something like this (with a bunch of other columns removed, of course):

sqlite> select Last_Updated,User_ID from records limit 4;

Last_Updated   User_ID
-------------  --------
1434003858430  1   
1433882146115  3   
1433882837088  3   
1433964103500  2   

问题:我如何SELECT一个仅包含每个用户的最新记录的结果集?

Question: How do I SELECT a result set containing only the most recent record for each user?

鉴于上面的示例,我想找回的是一个看起来像这样的表:

Given the above example, what I'd like to get back is a table that looks like this:

Last_Updated   User_ID
-------------  --------
1434003858430  1   
1433882837088  3   
1433964103500  2   

(请注意,结果集仅包含用户3的最新记录.)

(Note that the result set only includes user 3's most recent record.)

实际上,此表中大约有250万行.

In reality, I have approximately 2.5 million rows in this table.

奖金::我一直在阅读有关JOIN,重复数据删除程序等的答案,并且一直在搜索教程/文章,以期希望我能找到我想要的东西.不见了.我有广泛的编程背景,因此我可以像以前做过一百次一样,以过程代码形式将此数据集重复数据删除,但是我厌倦了编写脚本来执行我认为在SQL中应该做的事情.这就是它的目的,对吧?

Bonus: I've been reading answers about JOINs, de-dupe procedures, and a bunch more, and I've been googling for tutorials/articles in the hopes that I would find what I'm missing. I have extensive programming background so I could de-dupe this dataset in procedural code like I've done a hundred times before, but I'm tired of writing scripts to do what I believe should be possible in SQL. That's what it's for, right?

那么,从概念上讲,我认为从我对SQL的理解中缺少什么,以便理解为什么您为我的问题提供的解决方案实际上有效? (只要有一篇很好的文章,实际上可以解释该实践背后的理论,就足够了.)我想知道为什么该解决方案真正有效,而不仅仅是它确实起作用.

So, what do you think is missing from my understand of SQL, conceptually, that I need in order to understand why the solution you've provided to my question actually works? (A reference to a good article that actually explains the theory behind the practice would suffice.) I want to know WHY the solution actually works, not just that it does.

非常感谢您的光临!

推荐答案

您可以尝试以下方法:

select user_id, max(last_updated) as latest
from records
group by user_id

这应该为您提供每个用户的最新记录.我假设您有一个结合了user_id和last_updated的索引.

This should give you the latest record per user. I assume you have an index on user_id and last_updated combined.

通常,在上述查询中-我们正在要求数据库对user_id记录进行分组.如果user_id 1的记录多于1条,则它们将全部分组在一起.从该记录集中,将选择最大的last_updated作为输出.然后,寻找下一组,并在其中应用相同的操作.

In the above query, generally speaking - we are asking the database to group user_id records. If there are more than 1 records for user_id 1, they will all be grouped together. From that recordset, maximum last_updated will be picked for output. Then the next group is sought and the same operation is applied there.

如果您有复合索引,则sqlite可能会只使用索引,因为索引包含查询中寻址的两个字段.索引小于表本身,因此扫描或查找速度更快.

If you have a composite index, sqlite will likely just use the index because the index contains both fields addressed in the query. Indexes are smaller than the table itself, so scanning or seeking is faster.

这篇关于SQLite:如何选择“每个用户的最新记录"?从单个表与复合键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆