SQL:计数和子查询 [英] SQL: Count and subquery

查看:77
本文介绍了SQL:计数和子查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

再次使用count和sql

Again with count and sql

我有表


  • 论文:paper_id,doi,年份

  • 作者:paper_id,author_id,inst_id

  • 作者:author_id,姓名,名字

  • inst:inst_id,名称,see_id

  • papers: paper_id, doi, year
  • writtenby: paper_id, author_id, inst_id
  • authors: author_id, name, firstname
  • inst: inst_id, name, see_id

inst是一个机构表:大学等。
writebyby的每一行都会给出一篇论文,一位作者,那个作者当时所隶属的机构。
可以有一个以上的机构,并且每个机构都重复使用夫妇paper_id和author_id。
对于一个给定的作者,我想要一个列表以及papers.doi,papers.year以及他与之一起撰写论文的追随者数量。
我尝试过

inst is a table of Institutions: Universities and so on. Each line in writtenby gives a paper, an author, an institution this author was attached at that time. There can be more then one institution and the couple paper_id, author_id is repeated for each institution. For a given author, I want a list and of papers.doi, papers.year and the count of cohautors he has written a paper with. I tried

 SELECT  papers.doi, papers.year, count(*) as c
 FROM authors 
 INNER JOIN writtenby ON authors.author_id =  writtenby.author_id
 INNER JOIN writtenby AS writtenby_1 ON writtenby.paper_id =  
 writtenby_1.paper_id
 INNER JOIN papers on  writtenby_1.paper_id = papers.paper_id 
 WHERE authors.name ='Beck' AND authors.firstname= 'H P' 
 GROUP BY papers.doi, papers.year
 ORDER BY c DESC

我遇到的问题可能是,如果我正在寻找的作者针对给定的论文出现两次(因为两次)机构
的数量加倍。对于给定的论文,2890的预期结果由以下行给出:

The problem I'm having could be that if the author I'm searching with appears twice for a given paper (because of two institutions) the count is doubled. For a given paper the expected result of 2890, is given by the number of rows of

SELECT DISTINCT author_id
FROM writtenby 
WHERE paper_id = 4593 

(使用我的数据:2890行)
没有区别,我将有3023行,上面的第一个查询给出6046的计数。我试图在上面的Count子句中使用DISTINCT,但仍然无法正常工作。

(with my data: 2890 rows) Without distinct, I would have 3023 rows, and the first query above gives 6046 for count. I tried to use DISTINCT in the Count clause above but that still does not work.

我可以在子查询中使用count吗?谢谢您的帮助...

Can I use count with a subquery ? Thanks for any help...

示例数据:

-- Make the tables

CREATE TABLE 'authors' (name collate nocase, firstname collate nocase, see_id integer, 'author_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'inst' ('name' TEXT NOT NULL, 'country' TEXT NOT NULL , 'see_id' INTEGER, 'inst_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'papers' ('doi' TEXT NOT NULL,'year' TEXT NOT NULL, 'paper_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'writtenby' ('paper_id' INTEGER NOT NULL, 'author_id' INTEGER NOT NULL, 'inst_id' INTEGER NOT NULL, PRIMARY KEY ('paper_id', 'author_id', 'inst_id'));

-- Insert the data 

-- authors : 5 names, one with 2 variants

INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Doe', 'J', 1, 1);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Klein', 'K', 2, 2);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Lang', 'F', 3, 3);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Rue', 'A De La', 6, 4);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('La Rue', 'A De', 6, 5);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('De La Rue', 'A', 6, 6);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Smith', 'S', 7, 7);

-- inst 4 name, 2 variants

INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Universite de Paris', 'France', 1, 1);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Paris University', 'France', 1, 2);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Universite de Lyon', 'France', 3, 3);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Univ Freiburg', 'Germany', 4, 4);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('EPFZ', 'Switzerland', 5, 5);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Eidg Techn Hochschule', 'Switzerland', 5, 6);

-- papers: 3 papers

INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi1', '2017', 1);
INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi2', '2018', 2);
INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi3', '2018', 3);

-- paper 1: 4 authors

INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (1, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (1, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (1, 1, 5);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (1, 2, 4);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (1, 7, 1);

-- paper 2: 3 authors

INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (2, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (2, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (2, 1, 5);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (2, 2, 5);

-- paper 3: 3 authors

INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 2, 4);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 2, 1);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 3, 4);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 3, 5);
INSERT INTO 'writtenby' (paper_id, author_id,  inst_id) VALUES (3, 3, 1);

检查查询:

 SELECT  papers.doi, papers.year, count(*) as c
 FROM authors 
 INNER JOIN writtenby ON authors.author_id =  writtenby.author_id
 INNER JOIN writtenby AS writtenby_1 ON writtenby.paper_id =  
 writtenby_1.paper_id
 INNER JOIN papers on  writtenby_1.paper_id = papers.paper_id 
 WHERE authors.name ='De La Rue' AND authors.firstname= 'A' 
 GROUP BY papers.doi, papers.year
 ORDER BY c DESC


SELECT p.doi, p.year, COUNT(w2.author_id) AS cnt
FROM authors a
INNER JOIN writtenby w1
    ON a.author_id =  w1.author_id
INNER JOIN writtenby w2
    ON w1.paper_id = w2.paper_id AND w1.author_id <> w2.author_id
INNER JOIN papers p
    ON w2.paper_id = p.paper_id 
WHERE
    a.name = 'De La Rue' AND a.firstname = 'A'
GROUP BY
    p.doi, p.year
ORDER BY
    cnt DESC;

两个查询均给出错误的结果
第一个:

Both queries gives wrong results First one:

doi3|2018|14
doi1|2017|10
doi2|2018|8

第二个查询

doi3|2018|10
doi1|2017|6
doi2|2018|4

François

推荐答案

我看到的一个计数问题是您对 writeby 的自加入表。在这里,您没有检查匹配的行是否具有不同 author_id 。如果 author_id 是相同的,则您不应该将其计算在内。另外,您应该计算的共享作者数量是第二个 writeby 表。这样,如果给定作者没有任何共同作者,则计数将显示为零。

One counting problem I see is happening is in your self-join of the writtenby table. There, you are not checking that the matching row has a different author_id. If the author_id be the same, then you should not be counting it. Also, what you should be counting for the number of shared authors is the second writtenby table. This way, should a given author not have any coauthors, the count would show up as zero.

SELECT p.doi, p.year, COUNT(w2.author_id) AS cnt
FROM authors a
INNER JOIN writtenby w1
    ON a.author_id =  w1.author_id
INNER JOIN writtenby w2
    ON w1.paper_id = w2.paper_id AND w1.author_id <> w2.author_id
INNER JOIN papers p
    ON w2.paper_id = p.paper_id 
WHERE
    a.name = 'Beck' AND a.firstname = 'H P'
GROUP BY
    p.doi, p.year
ORDER BY
    cnt DESC;

这篇关于SQL:计数和子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆