如何从DBPedia获得所有公司? [英] How to get all companies from DBPedia?

查看:59
本文介绍了如何从DBPedia获得所有公司?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是查询DBPedia的新手。如何获得 http://dbpedia.org/sparql 所有公司?

I'm new to querying DBPedia. How can I get all companies from http://dbpedia.org/sparql?

此查询仅返回5万个组织:

This query returns only 50'000 organizations:

SELECT DISTINCT * WHERE {?company a dbpedia-owl:Company}


推荐答案

没错,您的查询没有返回所有公司。不过,这种模式是正确的。请注意,该查询仅对公司进行计数,返回的是88054:

You're right that your query isn't returning all the companies. The pattern is correct, though. Notice that this query which only counts the companies returns 88054:

prefix dbpedia-owl: <http://dbpedia.org/ontology/>

select (count(distinct ?company) as ?count)
where {
  ?company a dbpedia-owl:Company
}

SPARQL结果

我认为这是一个限制由于性能原因,由DBpedia SPARQL端点强制实施。您可以做的一件事是下载数据并在本地运行查询,但这可能比您想要的工作还要多。相反,您可以对结果进行排序(如何并不重要,只要您始终以相同的方式进行操作)并使用 limit offset 在这些结果中进行选择。例如:

I think this is a limit imposed by the DBpedia SPARQL endpoint for performance reasons. One thing that you could do is download the data and run your query locally, but that's probably a bit more work than you want. Instead, you can order the results (it doesn't really matter how, so long as you always do it the same way) and use limit and offset to select within those results. For instance:

prefix dbpedia-owl: <http://dbpedia.org/ontology/>

select ?company
where {
  ?company a dbpedia-owl:Company
}
order by ?company
limit 10

SPARQL结果

prefix dbpedia-owl: <http://dbpedia.org/ontology/>

select ?company
where {
  ?company a dbpedia-owl:Company
}
order by ?company
limit 10
offset 5823

SPARQL结果

这是一般方法。但是,由于对40000结果的硬限制,它在DBpedia上仍然存在问题。有一篇文档文章提到了这一点:

This is the general approach. However, it still has a problem on DBpedia because of a hard limit on 40000 results. There's a documentation article which mentions this:


使用约束条件DBpedia的SPARQL端点MaxSortedTopRows Limits通过LIMIT&偏移



DBpedia SPARQL端点配置了以下INI
设置:

Working with constraints DBpedia's SPARQL endpoint MaxSortedTopRows Limits via LIMIT & OFFSET

The DBpedia SPARQL endpoint is configured with the following INI setting:

MaxSortedTopRows = 40000

上面的设置设置了排序阈值

The setting above sets a threshold for sorted rows.

该文章的建议解决方案是使用子查询:

The proposed solution from that article is to use subqueries:


为防止出现上述问题,您可以利用
子查询的使用,这些子查询可以更好地利用与
相关的此类查询的临时存储。示例将采用以下形式:

To prevent the problem outlined above you can leverage the use of subqueries which make better use of temporary storage associated with this kind of quest. An example would take the form:

SELECT ?p ?s 
WHERE 
  {
    {
      SELECT DISTINCT ?p ?s 
      FROM <http://dbpedia.org> 
      WHERE   
        { 
          ?s ?p <http://dbpedia.org/resource/Germany> 
        } ORDER BY ASC(?p) 
    }
  } 
OFFSET 50000 
LIMIT 1000


我不完全确定为什么这可以解决问题,也许是端点可以对40000行以上的行进行排序,只要不行不必全部归还。无论如何,它起作用。您的查询将变为:

I'm not entirely sure why this solves the problem, perhaps it's that the endpoint can sort more than 40000 rows, as long as it doesn't have to return them all. At any rate, it does work, though. Your query would become:

prefix dbpedia-owl: <http://dbpedia.org/ontology/>

select ?company {{
  select ?company { 
    ?company a dbpedia-owl:Company
  }
  order by ?company
}} 
offset 88000
LIMIT 1000

这篇关于如何从DBPedia获得所有公司?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆