以"t"开头的值是多少?以及如何忽略它们进行计数 [英] What are values starting with "t" and how to ignore them for counting

查看:174
本文介绍了以"t"开头的值是多少?以及如何忽略它们进行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用SPARQL查询 Wikidata 中某些属性的频率.

I am trying to query the frequency of certain attributes in Wikidata, using SPARQL.

例如,要查明性别的不同值的频率是多少,我有以下查询:

For example, to find out what the frequency of different values for gender is, I have the following query:

SELECT ?rid (COUNT(?rid) AS ?count)
WHERE { ?qid wdt:P21 ?rid.
  BIND(wd:Q5 AS ?human)
  ?qid wdt:P31 ?human.
} GROUP BY ?rid

我得到以下结果:

wd:Q6581097 2752163
wd:Q6581072 562339
wd:Q1052281 223
wd:Q1097630 68
wd:Q2449503 67
wd:Q48270   36
wd:Q44148   8
wd:Q43445   4
t152990852  1
t152990762  1
t152990752  1
t152990635  1
t152775383  1
t152775370  1
t152775368  1
...

我对此有以下疑问:

  • 这些t152...值指的是什么?
  • 如何忽略包含t152...的元组?
    我尝试了FILTER ( !strstarts(str(?rid), "wd:") ),但是超时了.
  • 如何计算不同个答案?
    我在上述查询中尝试了SELECT (COUNT(DISTINCT ?rid) AS ?count),但再次超时.
  • What do those t152... values refer to?
  • How can I ignore the tuples containing t152...?
    I tried FILTER ( !strstarts(str(?rid), "wd:") ) but it timed out.
  • How can I count the distinct number of answers?
    I tried SELECT (COUNT(DISTINCT ?rid) AS ?count) with the above query, but again it timed out.

推荐答案

t开头的值被"skolemized" 未知值(例如,请参见 Q2423351 性别不明的人.

Values starting with t are "skolemized" unknown values (see, e.g., Q2423351 for a person of unknown sex or gender).

为了提高性能,建议您将查询分为三个部分:

In order to improve performance, I suggest you to divide your query into three parts:

  1. 所有正常"性别:

  1. All "normal" genders:

SELECT ?rid (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   ?rid wdt:P31 wd:Q48264 
} GROUP BY ?rid ORDER BY DESC(?count)

请注意,根据Wikidata, wd:Q746411 子类 "nofollow noreferrer"> wd:Q48270 等

Please note that, according Wikidata, wd:Q746411 is a subclass of wd:Q48270, etc.

所有非正常"性别:

SELECT ?rid (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   FILTER (?rid NOT IN
           (
            wd:Q6581097,
            wd:Q6581072,
            wd:Q1052281,
            wd:Q2449503,
            wd:Q48270,
            wd:Q746411,
            wd:Q189125,
            wd:Q1399232,
            wd:Q3277905
           )
          ).
   FILTER (isURI(?rid))
} GROUP BY ?rid ORDER BY DESC(?count)

出于性能原因,我不使用FILTER NOT EXISTS {?rid wdt:P31 wd:Q48264 }.

I do not use FILTER NOT EXISTS {?rid wdt:P31 wd:Q48264 } due to performance reasons.

所有(即1个)未知"性别:

All (i.e. 1) "unknown" genders:

SELECT (COUNT(?qid) AS ?count) 
WHERE {
   ?qid wdt:P31 wd:Q5.
   ?qid wdt:P21 ?rid.
   FILTER (!isURI(?rid))
} 

实际上,这对于您的情况不是很重要-计算不同的 wd:Q5 或说它们并没有区别-但出于性能方面的考虑,后者是可取的.

In fact, it is not very important in your case — to count distinct wd:Q5 or count them not distinct — but the latter is preferable due to performance reasons.

这篇关于以"t"开头的值是多少?以及如何忽略它们进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆