提取给定节点的所有父节点 [英] Extract all parents of a given node
问题描述
我正在尝试使用 EBI-RDF sparql 端点,我基于 这个 两个类似的问题要制定查询,以下是说明问题的两个示例:
I'm trying to extract all parents of a each given GO Id (a node) using EBI-RDF sparql endpoint, I was based on this two similar questions to formulate the query, here're two examples illustrating the problem:
示例 1(链接到结构):
biological_process (GO:0008150)
|__ metabolic process (GO:0008152)
|__ methylation (GO:0032259)
在本例中,使用以下查询:
In this example, using the following query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT (count(?mid) as ?depth)
(group_concat(distinct ?midId ; separator = " / ") AS ?treePath)
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?treePath
ORDER BY ?depth
我毫无问题地得到了想要的结果:
I got the desired results without problems:
c | treePath
--|-------------------------------------
6 | GO:0008150 / GO:0008152 / GO:0032259
但是当术语存在于多个分支中时(例如 GO:0007267
),如下例所示,之前的方法不起作用:
But when the term exists in multiple branches (e.g GO:0007267
) as in the case below, the previous approach didn't work:
示例 2(结构链接)
biological_process (GO:0008150)
|__ cellular_process (GO:0009987)
| |__ cell communication (GO:0007154)
| |__ cell-cell signaling (GO:0007267)
|
|__ signaling (GO:0023052)
|__ cell-cell signaling (GO:0007267)
结果:
c | treePath
--|---------------------------------------------------------------
15| GO:0007154 / GO:0007267 / GO:0008150 / GO:0009987 / GO:0023052
我想得到的是以下内容:
What I wanted to get is the following:
GO:0008150 / GO:0009987 / GO:0007154 / GO:0007267
GO:0008150 / GO:0023052 / GO:0007267
<小时>
我的理解是,在幕后我正在计算每个级别的深度并使用它来构建路径,当我们有一个仅属于一个分支的元素时,这可以正常工作.
What I understood is that under the hood I'm calculating the depth of each level and using it to construct the path, this works fine when we have an element that belongs only to one branch.
SELECT (count(?mid) as ?depth) ?midId
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?midId
ORDER BY ?depth
结果:
depth | midId
------|------------
1 | GO:0008150
2 | GO:0008152
3 | GO:0032259
在第二个例子中,事情被遗漏了,我不明白为什么,无论如何我确定问题的一部分是具有相同深度/级别的术语,但我不知道如何我解决了这个问题.
In the second example, things are missed up and I didn't get why, in any ways I'm sure that part of the problem are terms that have the same depth/level, but I don't know how can I solve this.
depth | midId
------|------------
2 | GO:0008150
2 | GO:0009987
2 | GO:0023052
3 | GO:0007154
6 | GO:0007267
推荐答案
感谢@AKSW,我找到了一个使用 的不错的解决方案HyperGraphQL(一个 GraphQL 接口,用于在 Web 上查询和提供链接数据).
Thanks to @AKSW I found a decent solution using HyperGraphQL (a GraphQL interface for querying and serving linked data on the Web).
我会在这里留下详细的答案,它可能对某人有所帮助.
I'll leave the detailed answer here, it may help someone.
我使用的 config.json
文件:
{
"name": "ebi-hgql",
"schema": "ebischema.graphql",
"server": {
"port": 8081,
"graphql": "/graphql",
"graphiql": "/graphiql"
},
"services": [
{
"id": "ebi-sparql",
"type": "SPARQLEndpointService",
"url": "http://www.ebi.ac.uk/rdf/services/sparql",
"graph": "http://rdf.ebi.ac.uk/dataset/go",
"user": "",
"password": ""
}
]
}
这是我的 ebischema.graphql
文件的样子(因为我只需要 Class
、id
、label
> 和 subClassOf
):
Here's how my ebischema.graphql
file looks like (Since I needed only the Class
, id
, label
and subClassOf
):
type __Context {
Class: _@href(iri: "http://www.w3.org/2002/07/owl#Class")
id: _@href(iri: "http://www.geneontology.org/formats/oboInOwl#id")
label: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#label")
subClassOf: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#subClassOf")
}
type Class @service(id:"ebi-sparql") {
id: [String] @service(id:"ebi-sparql")
label: [String] @service(id:"ebi-sparql")
subClassOf: [Class] @service(id:"ebi-sparql")
}
我开始测试一些简单的查询,但不断得到空响应;这个问题的答案解决了我的问题.
最后我构造了查询来获取树
Finally I constructed the query to get the tree
使用此查询:
{
Class_GET_BY_ID(uris:[
"http://purl.obolibrary.org/obo/GO_0032259",
"http://purl.obolibrary.org/obo/GO_0007267"]) {
id
label
subClassOf {
id
label
subClassOf {
id
label
}
}
}
}
我得到了一些有趣的结果:
I got some interesting results:
{
"extensions": {},
"data": {
"@context": {
"_type": "@type",
"_id": "@id",
"id": "http://www.geneontology.org/formats/oboInOwl#id",
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"Class_GET_BY_ID": "http://hypergraphql.org/query/Class_GET_BY_ID",
"subClassOf": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
},
"Class_GET_BY_ID": [
{
"id": [
"GO:0032259"
],
"label": [
"methylation"
],
"subClassOf": [
{
"id": [
"GO:0008152"
],
"label": [
"metabolic process"
],
"subClassOf": [
{
"id": [
"GO:0008150"
],
"label": [
"biological_process"
]
}
]
}
]
},
{
"id": [
"GO:0007267"
],
"label": [
"cell-cell signaling"
],
"subClassOf": [
{
"id": [
"GO:0007154"
],
"label": [
"cell communication"
],
"subClassOf": [
{
"id": [
"GO:0009987"
],
"label": [
"cellular process"
]
}
]
},
{
"id": [
"GO:0023052"
],
"label": [
"signaling"
],
"subClassOf": [
{
"id": [
"GO:0008150"
],
"label": [
"biological_process"
]
}
]
}
]
}
]
},
"errors": []
}
编辑
这正是我想要的,但我注意到我不能像这样添加另一个子级别:
This was exactly what I wanted, but I noticed that I can't add another sublevel like this:
{
Class_GET_BY_ID(uris:[
"http://purl.obolibrary.org/obo/GO_0032259",
"http://purl.obolibrary.org/obo/GO_0007267"]) {
id
label
subClassOf {
id
label
subClassOf {
id
label
subClassOf { # <--- 4th sublevel
id
label
}
}
}
}
}
我创建了一个新问题:端点返回的 Content-Type: text/html 无法被 SELECT 查询识别
这篇关于提取给定节点的所有父节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!