更好的查询策略,按文件哈希频率和文件大小对文件进行排序 [英] Better query strategy to sort files by file hash frequency and file size

查看:68
本文介绍了更好的查询策略,按文件哈希频率和文件大小对文件进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有多想就写了这个查询,但作为初学者,我几乎可以肯定它可以写得更好.

I've wrote this query without much thought but as a beginner I'm almost sure it could be written better.

这里是:

    SELECT filehash, filename, filesize, group_files
      FROM files
INNER JOIN (  SELECT filehash group_id,
                     COUNT(filehash) group_files
                FROM files
            GROUP BY filehash) groups
        ON files.filehash = groups.group_id
  ORDER BY group_files DESC,
           filesize DESC

表定义:

CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT,
                    filename TEXT,
                    filesize INTEGER,
                    filehash TEXT)

索引定义:

CREATE INDEX files_filehash_idx
          ON files(filehash)
CREATE UNIQUE INDEX files_filename_idx
                 ON files(filename)
CREATE INDEX files_filesize_idx
          ON files(filesize)

查询解释查询计划:

selectid    order   from    detail
1           0       0       SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows)
0           0       1       SCAN SUBQUERY 1 AS groups (~100 rows)
0           1       0       SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows)
0           0       0       USE TEMP B-TREE FOR ORDER BY

如果我错了,你能纠正我吗?提前致谢.

Could you correct me if I'm wrong? Thank you in advance.

推荐答案

你觉得这个版本怎么样?

What do you think about this version?

  select filehash, group_concat(filename), filesize, count(*) as group_files
    from files
group by filehash
order by group_files desc

看起来可能会运行得更快.它可以满足您的需求吗?

It seems like that will probably run faster. Does it do what you need?

这篇关于更好的查询策略,按文件哈希频率和文件大小对文件进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆