如何使用RDFLib解析.ttl文件? [英] How to parse .ttl files with RDFLib?

查看:893
本文介绍了如何使用RDFLib解析.ttl文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.ttl格式的文件.它具有4个属性/列,其中包含以下形式的四倍体:

I have a file in .ttl form. It has 4 attributes/columns containing quadruples of the following form:

  1. (id, student_name, student_address, student_phoneno).
  2. (id, faculty_name, faculty_address, faculty_phoneno).
  1. (id, student_name, student_address, student_phoneno).
  2. (id, faculty_name, faculty_address, faculty_phoneno).

我知道如何使用RDFLib解析.n3形式三​​元组;

I know how to parse .n3 form triples with RDFLib;

from rdflib import Graph
g = Graph()
g.parse("demo.nt", format="nt")

但是我不确定如何解析这些四倍.

but I am not sure as to how to parse these quadruples.

我的意图是解析并提取与特定ID有关的所有信息.学生和教职员工的ID可以相同.

My intent is to parse and extract all the information pertaining to a particular id. The id can be same for both student and faculty.

如何使用RDFLib处理这些四倍体并将其用于基于id的聚合?

How can I use RDFLib to process these quadruples and use it for aggregation based on id?

.ttl文件中的示例片段:

#@ <id1>
<Alice> <USA> <12345>

#@ <id1>
<Jane> <France> <78900>

推荐答案

Turtle Notation 3的子集语法,因此 rdflib 应该能够使用format='n3'进行解析. 检查rdflib是否保留注释(样本中的注释(#...)中指定了id).如果不是这样,并且输入格式如示例中所示那样简单,那么您可以手动对其进行解析:

Turtle is a subset of Notation 3 syntax so rdflib should be able to parse it using format='n3'. Check whether rdflib preserves comments (ids are specified in the comments (#...) in your sample). If not and the input format is as simple as shown in your example then you could parse it manually:

import re
from collections import namedtuple
from itertools import takewhile

Entry = namedtuple('Entry', 'id name address phone')

def get_entries(path):
    with open(path) as file:
        # an entry starts with `#@` line and ends with a blank line
        for line in file:
            if line.startswith('#@'):
                buf = [line]
                buf.extend(takewhile(str.strip, file)) # read until blank line
                yield Entry(*re.findall(r'<([^>]+)>', ''.join(buf)))

print("\n".join(map(str, get_entries('example.ttl'))))

输出:

Entry(id='id1', name='Alice', address='USA', phone='12345')
Entry(id='id1', name='Jane', address='France', phone='78900')

要将条目保存到数据库:

To save entries to a db:

import sqlite3

with sqlite3.connect('example.db') as conn:
    conn.execute('''CREATE TABLE IF NOT EXISTS entries
             (id text, name text, address text, phone text)''')
    conn.executemany('INSERT INTO entries VALUES (?,?,?,?)',
                     get_entries('example.ttl'))

如果需要在Python中进行一些后处理,请按ID分组:

To group by id if you need some postprocessing in Python:

import sqlite3
from itertools import groupby
from operator import itemgetter

with sqlite3.connect('example.db') as c:
    rows = c.execute('SELECT * FROM entries ORDER BY id LIMIT ?', (10,))
    for id, group in groupby(rows, key=itemgetter(0)):
        print("%s:\n\t%s" % (id, "\n\t".join(map(str, group))))

输出:

id1:
    ('id1', 'Alice', 'USA', '12345')
    ('id1', 'Jane', 'France', '78900')

这篇关于如何使用RDFLib解析.ttl文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆