我如何从Freebase获得所有电影ID的列表? [英] How can I get a list of all film ids from Freebase?

查看:271
本文介绍了我如何从Freebase获得所有电影ID的列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我几年前从事的一个项目中,我正在构建有关Freebase电影的一组数据.一个简单的shell脚本下载了"film.tsv"文件(从 http://download.freebase.com/datadumps/latest/browse/film/film.tsv ).然后,我使用该文件中的"id"字段为每部电影构建必要的MQL请求(检索我感兴趣的其他属性,例如演员,流派).

On a project I was working on a couple of years back, I was building a set of data about movies from Freebase. A simple shell script downloaded the "film.tsv" file (from http://download.freebase.com/datadumps/latest/browse/film/film.tsv). I then used the "id" field in that file to build the necessary MQL requests for each of the films (retrieving the other properties I was interested in e.g. actors, genres).

在查看了今天的开发人员指南之后,我意识到Freebase的发展相当不错,而且我发现以前使用的转储文件不再可用.我还看到转储文件格式现在是RDF,据我所知,转储文件现在仅可作为单个22GB存档使用.

After looking at the developer's guide today I realise that Freebase has moved on a fair bit and significantly I see that the dump file I used before is no longer available. I also see that the dump file format is now RDF and from what I can tell the dump files are now only available as a single 22GB archive.

在所有可能的情况下,我都希望避免每次都希望重建数据集时下载22G文件,这样就可以再检索单个转储文件了,例如像film.tsv文件一样?

If at all possible I would like to avoid downloading a 22G file each time I want to rebuild my data set so is it possible to retrieve individual dump files anymore e.g. like the film.tsv file?

是否没有其他方法可以获取电影ID的完整列表?

If not is there an alternative way to obtain a full list of movie ids?

推荐答案

目前没有针对film.tsv进行替换的计划.您可以像这样从 RDF转储中获取电影ID的当前列表. :

There's no replacement planned for film.tsv right now. You can get the current list of film IDs from the RDF dump like this:

zgrep $'\ttype\.object\.type\tfilm\.film' freebase-rdf.gz

然后,当您需要更新列表时,请查询 MQL读取API 有关自上次更新以来已添加的新电影的列表:

Then when you need to update the list you query the MQL Read API for a list of new films that have been added since your last update:

[{
  "type": "/film/film",
  "id": null,
  "name": null,
  "timestamp": null,
  "timestamp>=": "2013-12",
  "sort": "-timestamp"
}]

由于API一次返回200个结果,因此您需要使用

Since the API returns 200 results at a time you'll need to use a cursor to get the full list of results.

这篇关于我如何从Freebase获得所有电影ID的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆