Freebase中所有标题/主题标题的文本文件 [英] text file of all titles / topic titles in Freebase
问题描述
我需要一个文本文件来包含.txt文件中每个项目的每个标题/标题。我这样做,或者如果我已经下载了一个freebase rdf转储这样做,如果可能,我还需要一个单独的文本文件,每个主题的/项目的描述在一个单一的每行描述一下。
我该怎么做?
有人可以帮我从Freebase的rdf转储文件中提取这两个文件。
感谢您的帮助!
ns:type.object.name
上过滤RDF转储。如果您只想要特定的语言,还可以按照该语言进行过滤,例如编辑:我错过了关于描述所需的第二部分以及。这里有一个三部分的正则表达式可以帮助你:$ b
- 英文名字
- 英文说明
- 类型/ commmon / topic
- English names
- English descriptions
- a type of /commmon/topic
zegrep $'\ tns:(((type \\.object\\。名称| common\\.topic\\.description)\t * @中文)| type\\.object\\.type\tns:common\\.topic) \\'$'freebase-rdf-2013-06-30-00-00.gz | gzip> freebase-rdf-2013-06-30-00-00-names-descriptions.gz
看来有一个性能问题,我不得不看看。整个文件的一个简单的grep需要我的笔记本电脑〜11分钟,但这已经运行了几次。我将不得不稍后再看...
I need a text file to contain every title / title of each topic / title of each item in a .txt file each on its own line.
How can I do this or make this if I have already downloaded a freebase rdf dump?
If possible, I also need a separate text file with each topic's / item's description on a single line each description on its own line.
How can I do that?
I would greatly appreciate it if someone could help me make either of these files from a Freebase rdf dump.
Thanks in Advance!
Filter the RDF dump on the predicate/property ns:type.object.name
. If you only want a particular language, also filter by that language e.g. @en
.
EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:
Combining the three is left as an exercise for the reader.
zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz
It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...
这篇关于Freebase中所有标题/主题标题的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!