SQL:计数和子查询 [英] SQL: Count and subquery
问题描述
再次使用count和sql
Again with count and sql
我有表
- 论文:paper_id,doi,年份
- 作者:paper_id,author_id,inst_id
- 作者:author_id,姓名,名字
- inst:inst_id,名称,see_id
- papers: paper_id, doi, year
- writtenby: paper_id, author_id, inst_id
- authors: author_id, name, firstname
- inst: inst_id, name, see_id
inst是一个机构表:大学等。
writebyby的每一行都会给出一篇论文,一位作者,那个作者当时所隶属的机构。
可以有一个以上的机构,并且每个机构都重复使用夫妇paper_id和author_id。
对于一个给定的作者,我想要一个列表以及papers.doi,papers.year以及他与之一起撰写论文的追随者数量。
我尝试过
inst is a table of Institutions: Universities and so on. Each line in writtenby gives a paper, an author, an institution this author was attached at that time. There can be more then one institution and the couple paper_id, author_id is repeated for each institution. For a given author, I want a list and of papers.doi, papers.year and the count of cohautors he has written a paper with. I tried
SELECT papers.doi, papers.year, count(*) as c
FROM authors
INNER JOIN writtenby ON authors.author_id = writtenby.author_id
INNER JOIN writtenby AS writtenby_1 ON writtenby.paper_id =
writtenby_1.paper_id
INNER JOIN papers on writtenby_1.paper_id = papers.paper_id
WHERE authors.name ='Beck' AND authors.firstname= 'H P'
GROUP BY papers.doi, papers.year
ORDER BY c DESC
我遇到的问题可能是,如果我正在寻找的作者针对给定的论文出现两次(因为两次)机构
的数量加倍。对于给定的论文,2890的预期结果由以下行给出:
The problem I'm having could be that if the author I'm searching with appears twice for a given paper (because of two institutions) the count is doubled. For a given paper the expected result of 2890, is given by the number of rows of
SELECT DISTINCT author_id
FROM writtenby
WHERE paper_id = 4593
(使用我的数据:2890行)
没有区别,我将有3023行,上面的第一个查询给出6046的计数。我试图在上面的Count子句中使用DISTINCT,但仍然无法正常工作。
(with my data: 2890 rows) Without distinct, I would have 3023 rows, and the first query above gives 6046 for count. I tried to use DISTINCT in the Count clause above but that still does not work.
我可以在子查询中使用count吗?谢谢您的帮助...
Can I use count with a subquery ? Thanks for any help...
示例数据:
-- Make the tables
CREATE TABLE 'authors' (name collate nocase, firstname collate nocase, see_id integer, 'author_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'inst' ('name' TEXT NOT NULL, 'country' TEXT NOT NULL , 'see_id' INTEGER, 'inst_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'papers' ('doi' TEXT NOT NULL,'year' TEXT NOT NULL, 'paper_id' INTEGER PRIMARY KEY NOT NULL );
CREATE TABLE 'writtenby' ('paper_id' INTEGER NOT NULL, 'author_id' INTEGER NOT NULL, 'inst_id' INTEGER NOT NULL, PRIMARY KEY ('paper_id', 'author_id', 'inst_id'));
-- Insert the data
-- authors : 5 names, one with 2 variants
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Doe', 'J', 1, 1);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Klein', 'K', 2, 2);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Lang', 'F', 3, 3);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Rue', 'A De La', 6, 4);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('La Rue', 'A De', 6, 5);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('De La Rue', 'A', 6, 6);
INSERT INTO 'authors' (name, firstname, see_id, author_id) VALUES ('Smith', 'S', 7, 7);
-- inst 4 name, 2 variants
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Universite de Paris', 'France', 1, 1);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Paris University', 'France', 1, 2);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Universite de Lyon', 'France', 3, 3);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Univ Freiburg', 'Germany', 4, 4);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('EPFZ', 'Switzerland', 5, 5);
INSERT INTO 'inst' (name, country, see_id, inst_id) VALUES ('Eidg Techn Hochschule', 'Switzerland', 5, 6);
-- papers: 3 papers
INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi1', '2017', 1);
INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi2', '2018', 2);
INSERT INTO 'papers' (doi, year, paper_id) VALUES ('doi3', '2018', 3);
-- paper 1: 4 authors
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (1, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (1, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (1, 1, 5);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (1, 2, 4);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (1, 7, 1);
-- paper 2: 3 authors
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (2, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (2, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (2, 1, 5);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (2, 2, 5);
-- paper 3: 3 authors
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 6, 1);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 2, 4);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 6, 3);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 2, 1);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 3, 4);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 3, 5);
INSERT INTO 'writtenby' (paper_id, author_id, inst_id) VALUES (3, 3, 1);
检查查询:
SELECT papers.doi, papers.year, count(*) as c
FROM authors
INNER JOIN writtenby ON authors.author_id = writtenby.author_id
INNER JOIN writtenby AS writtenby_1 ON writtenby.paper_id =
writtenby_1.paper_id
INNER JOIN papers on writtenby_1.paper_id = papers.paper_id
WHERE authors.name ='De La Rue' AND authors.firstname= 'A'
GROUP BY papers.doi, papers.year
ORDER BY c DESC
SELECT p.doi, p.year, COUNT(w2.author_id) AS cnt
FROM authors a
INNER JOIN writtenby w1
ON a.author_id = w1.author_id
INNER JOIN writtenby w2
ON w1.paper_id = w2.paper_id AND w1.author_id <> w2.author_id
INNER JOIN papers p
ON w2.paper_id = p.paper_id
WHERE
a.name = 'De La Rue' AND a.firstname = 'A'
GROUP BY
p.doi, p.year
ORDER BY
cnt DESC;
两个查询均给出错误的结果
第一个:
Both queries gives wrong results First one:
doi3|2018|14
doi1|2017|10
doi2|2018|8
第二个查询
doi3|2018|10
doi1|2017|6
doi2|2018|4
François
推荐答案
我看到的一个计数问题是您对 writeby
的自加入表。在这里,您没有检查匹配的行是否具有不同 author_id
。如果 author_id
是相同的,则您不应该将其计算在内。另外,您应该计算的共享作者数量是第二个 writeby
表。这样,如果给定作者没有任何共同作者,则计数将显示为零。
One counting problem I see is happening is in your self-join of the writtenby
table. There, you are not checking that the matching row has a different author_id
. If the author_id
be the same, then you should not be counting it. Also, what you should be counting for the number of shared authors is the second writtenby
table. This way, should a given author not have any coauthors, the count would show up as zero.
SELECT p.doi, p.year, COUNT(w2.author_id) AS cnt
FROM authors a
INNER JOIN writtenby w1
ON a.author_id = w1.author_id
INNER JOIN writtenby w2
ON w1.paper_id = w2.paper_id AND w1.author_id <> w2.author_id
INNER JOIN papers p
ON w2.paper_id = p.paper_id
WHERE
a.name = 'Beck' AND a.firstname = 'H P'
GROUP BY
p.doi, p.year
ORDER BY
cnt DESC;
这篇关于SQL:计数和子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!