以"t"开头的值是多少?以及如何忽略它们进行计数 [英] What are values starting with "t" and how to ignore them for counting
问题描述
我正在尝试使用SPARQL查询 Wikidata 中某些属性的频率.
I am trying to query the frequency of certain attributes in Wikidata, using SPARQL.
例如,要查明性别的不同值的频率是多少,我有以下查询:
For example, to find out what the frequency of different values for gender is, I have the following query:
SELECT ?rid (COUNT(?rid) AS ?count)
WHERE { ?qid wdt:P21 ?rid.
BIND(wd:Q5 AS ?human)
?qid wdt:P31 ?human.
} GROUP BY ?rid
我得到以下结果:
wd:Q6581097 2752163
wd:Q6581072 562339
wd:Q1052281 223
wd:Q1097630 68
wd:Q2449503 67
wd:Q48270 36
wd:Q44148 8
wd:Q43445 4
t152990852 1
t152990762 1
t152990752 1
t152990635 1
t152775383 1
t152775370 1
t152775368 1
...
我对此有以下疑问:
- 这些
t152...
值指的是什么? - 如何忽略包含
t152...
的元组?
我尝试了FILTER ( !strstarts(str(?rid), "wd:") )
,但是超时了. - 如何计算不同个答案?
我在上述查询中尝试了SELECT (COUNT(DISTINCT ?rid) AS ?count)
,但再次超时.
- What do those
t152...
values refer to? - How can I ignore the tuples containing
t152...
?
I triedFILTER ( !strstarts(str(?rid), "wd:") )
but it timed out. - How can I count the distinct number of answers?
I triedSELECT (COUNT(DISTINCT ?rid) AS ?count)
with the above query, but again it timed out.
推荐答案
以t
开头的值被"skolemized" 未知值(例如,请参见 Q2423351 性别不明的人.
Values starting with t
are "skolemized" unknown values (see, e.g., Q2423351 for a person of unknown sex or gender).
为了提高性能,建议您将查询分为三个部分:
In order to improve performance, I suggest you to divide your query into three parts:
-
所有正常"性别:
All "normal" genders:
SELECT ?rid (COUNT(?qid) AS ?count)
WHERE {
?qid wdt:P31 wd:Q5.
?qid wdt:P21 ?rid.
?rid wdt:P31 wd:Q48264
} GROUP BY ?rid ORDER BY DESC(?count)
请注意,根据Wikidata, wd:Q746411 是子类 "nofollow noreferrer"> wd:Q48270 等
Please note that, according Wikidata, wd:Q746411 is a subclass of wd:Q48270, etc.
所有非正常"性别:
SELECT ?rid (COUNT(?qid) AS ?count)
WHERE {
?qid wdt:P31 wd:Q5.
?qid wdt:P21 ?rid.
FILTER (?rid NOT IN
(
wd:Q6581097,
wd:Q6581072,
wd:Q1052281,
wd:Q2449503,
wd:Q48270,
wd:Q746411,
wd:Q189125,
wd:Q1399232,
wd:Q3277905
)
).
FILTER (isURI(?rid))
} GROUP BY ?rid ORDER BY DESC(?count)
出于性能原因,我不使用FILTER NOT EXISTS {?rid wdt:P31 wd:Q48264 }
.
I do not use FILTER NOT EXISTS {?rid wdt:P31 wd:Q48264 }
due to performance reasons.
所有(即1个)未知"性别:
All (i.e. 1) "unknown" genders:
SELECT (COUNT(?qid) AS ?count)
WHERE {
?qid wdt:P31 wd:Q5.
?qid wdt:P21 ?rid.
FILTER (!isURI(?rid))
}
实际上,这对于您的情况不是很重要-计算不同的 wd:Q5 或说它们并没有区别-但出于性能方面的考虑,后者是可取的.
In fact, it is not very important in your case — to count distinct wd:Q5 or count them not distinct — but the latter is preferable due to performance reasons.
这篇关于以"t"开头的值是多少?以及如何忽略它们进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!