如何使用RDFLib解析.ttl文件? [英] How to parse .ttl files with RDFLib?
问题描述
我有一个.ttl
格式的文件.它具有4个属性/列,其中包含以下形式的四倍体:
I have a file in .ttl
form. It has 4 attributes/columns containing quadruples of the following form:
-
(id, student_name, student_address, student_phoneno)
. -
(id, faculty_name, faculty_address, faculty_phoneno)
.
(id, student_name, student_address, student_phoneno)
.(id, faculty_name, faculty_address, faculty_phoneno)
.
我知道如何使用RDFLib解析.n3
形式三元组;
I know how to parse .n3
form triples with RDFLib;
from rdflib import Graph
g = Graph()
g.parse("demo.nt", format="nt")
但是我不确定如何解析这些四倍.
but I am not sure as to how to parse these quadruples.
我的意图是解析并提取与特定ID有关的所有信息.学生和教职员工的ID可以相同.
My intent is to parse and extract all the information pertaining to a particular id. The id can be same for both student and faculty.
如何使用RDFLib处理这些四倍体并将其用于基于id
的聚合?
How can I use RDFLib to process these quadruples and use it for aggregation based on id
?
.ttl
文件中的示例片段:
#@ <id1>
<Alice> <USA> <12345>
#@ <id1>
<Jane> <France> <78900>
推荐答案
Turtle 是Notation 3
的子集语法,因此 rdflib 应该能够使用format='n3'
进行解析.
检查rdflib
是否保留注释(样本中的注释(#...
)中指定了id
).如果不是这样,并且输入格式如示例中所示那样简单,那么您可以手动对其进行解析:
Turtle is a subset of Notation 3
syntax so rdflib should be able to parse it using format='n3'
.
Check whether rdflib
preserves comments (id
s are specified in the comments (#...
) in your sample). If not and the input format is as simple as shown in your example then you could parse it manually:
import re
from collections import namedtuple
from itertools import takewhile
Entry = namedtuple('Entry', 'id name address phone')
def get_entries(path):
with open(path) as file:
# an entry starts with `#@` line and ends with a blank line
for line in file:
if line.startswith('#@'):
buf = [line]
buf.extend(takewhile(str.strip, file)) # read until blank line
yield Entry(*re.findall(r'<([^>]+)>', ''.join(buf)))
print("\n".join(map(str, get_entries('example.ttl'))))
输出:
Entry(id='id1', name='Alice', address='USA', phone='12345')
Entry(id='id1', name='Jane', address='France', phone='78900')
要将条目保存到数据库:
To save entries to a db:
import sqlite3
with sqlite3.connect('example.db') as conn:
conn.execute('''CREATE TABLE IF NOT EXISTS entries
(id text, name text, address text, phone text)''')
conn.executemany('INSERT INTO entries VALUES (?,?,?,?)',
get_entries('example.ttl'))
如果需要在Python中进行一些后处理,请按ID分组:
To group by id if you need some postprocessing in Python:
import sqlite3
from itertools import groupby
from operator import itemgetter
with sqlite3.connect('example.db') as c:
rows = c.execute('SELECT * FROM entries ORDER BY id LIMIT ?', (10,))
for id, group in groupby(rows, key=itemgetter(0)):
print("%s:\n\t%s" % (id, "\n\t".join(map(str, group))))
输出:
id1:
('id1', 'Alice', 'USA', '12345')
('id1', 'Jane', 'France', '78900')
这篇关于如何使用RDFLib解析.ttl文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!