Freebase中所有标题/主题标题的文本文件 [英] text file of all titles / topic titles in Freebase

查看:212
本文介绍了Freebase中所有标题/主题标题的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我需要一个文本文件来包含.txt文件中每个项目的每个标题/标题。我这样做,或者如果我已经下载了一个freebase rdf转储这样做,如果可能,我还需要一个单独的文本文件,每个主题的/项目的描述在一个单一的每行描述一下。



我该怎么做?



有人可以帮我从Freebase的rdf转储文件中提取这两个文件。



感谢您的帮助!

解决方案在谓词/属性 ns:type.object.name 上过滤RDF转储。如果您只想要特定的语言,还可以按照该语言进行过滤,例如编辑:我错过了关于描述所需的第二部分以及。这里有一个三部分的正则表达式可以帮助你:
$ b


  1. 英文名字

  2. 英文说明

  3. 类型/ commmon / topic





  4.   zegrep $'\ tns:(((type \\.object\\。名称| common\\.topic\\.description)\t * @中文)| type\\.object\\.type\tns:common\\.topic) \\'$'freebase-rdf-2013-06-30-00-00.gz | gzip> freebase-rdf-2013-06-30-00-00-names-descriptions.gz 

    看来有一个性能问题,我不得不看看。整个文件的一个简单的grep需要我的笔记本电脑〜11分钟,但这已经运行了几次。我将不得不稍后再看...

    I need a text file to contain every title / title of each topic / title of each item in a .txt file each on its own line.

    How can I do this or make this if I have already downloaded a freebase rdf dump?

    If possible, I also need a separate text file with each topic's / item's description on a single line each description on its own line.

    How can I do that?

    I would greatly appreciate it if someone could help me make either of these files from a Freebase rdf dump.

    Thanks in Advance!

    解决方案

    Filter the RDF dump on the predicate/property ns:type.object.name. If you only want a particular language, also filter by that language e.g. @en.

    EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:

    1. English names
    2. English descriptions
    3. a type of /commmon/topic

    Combining the three is left as an exercise for the reader.

    zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz
    

    It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...

    这篇关于Freebase中所有标题/主题标题的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆