定期更新的bigquery数据库中的记录顺序 [英] The order of records in a regularly updated bigquery database

查看:53
本文介绍了定期更新的bigquery数据库中的记录顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将在bigquery上维护数据库的本地副本.我将使用API​​和tabledata:list.该数据库不是我自己的,维护人员会通过附加新数据(例如每小时)定期对其进行更新.

I am going to be maintaining a local copy of a database on bigquery. I will be using the API and tabledata:list. This database is not my own, and is regularly updated by the maintainers by appending new data (say every hour).

  1. 首先,我可以假定在添加此数据时,它将确定添加到数据库的末尾吗?

  1. First, can I assume that when this data is appended, it will definitely be added to the end of the database?

现在,让我们假设当前数据库有1,000,000行,而我现在通过tabledata:list分页来下载所有这些行.另外,我们假设数据库在更新过程中(包含10,000行)进行了部分更新.通过使用页面令牌,可以确保我只下载按照我在数据库中显示的顺序开始时显示的100万行吗?

Now, let's assume that currently the database has 1,000,000 rows and I am now downloading all of these by paging through tabledata:list. Also, let's assume that the database is updated partway through (with 10,000 rows). By using the page tokens, can I be assured that I will only download the 1m rows present when I started in the order they are in in the database?

最后,现在让我们说我来更新副本.如果我使用startIndex为1,000,000来启动tabledata:list,并且使用maxResults为1000,我是否会得到10个页面,其中包含我期望的更新数据?

Finally, now let's say that I come to update my copy. If I initiate the tabledata:list with a startIndex of 1,000,000 and I use a maxResults of 1000, will I get 10 pages containing the updated data that I am expecting?

我认为所有这些问题归结为bigquery是否尊重数据所在的顺序,tabledata:list是否使用此顺序以及是否保证附加数据遵循先前的数据.

I suppose all these questions boil down to whether bigquery respects the order the data is in, whether this order is used by tabledata:list, and whether appended data is guaranteed to follow previous data.

由于有一列其值是唯一的,因此我可以执行一个简单的select count(1) from table来获取表的长度,因此我当然可以通过将本地db的长度与但是,如果不能保证以上所述并且我最终在数据中留下了漏洞,则由于主键不是顺序键(否则我可以只填写丢失的行)而进行补救是非常不切实际的数据库非常大.

As there is a column whose values are unique, and I can perform a simple select count(1) from table to get the length of the table, I can of course check that my local copy is complete by comparing the length of my local db with that of the remote, however if the above weren't guaranteed and I ended up with holes in my data, it would be quite impractical to remedy as the primary key is not sequential (otherwise I could just fill in the missing rows) and the database is very large.

推荐答案

  1. 追加数据时,我们将追加到表数据列表的末尾,但是,bigquery可能会定期合并数据,这不考虑排序.我们一直在讨论能够保留顺序,或者至少有一种访问最新数据的方式,但这尚未实现或设计.如果这对您来说是一项重要功能,请告诉我们,我们将对其进行优先排序.

  1. When you append data, we will append to the end of the table data list, however, bigquery may periodically coalesce data, which does not respect ordering. We have been discussing being able to preserve the ordering, or at least have a way of accessing the most recent data, but this is not yet implemented or designed. If it is an important feature for you, let us know and we'll prioritize it accordingly.

如果使用页面令牌,则可以确保列表稳定.如果该表在数据分页的中间进行更新,则在创建页面令牌时,您仍然只会看到该表中的数据.请注意,因此,页面令牌仅可使用24小时.

If you use page tokens, you are assured of a stable listing. If the table gets updated in the middle of paging through the data, you'll still only see the data that was in the table when you created the page token. Note that because of this, page tokens are only valid for 24 hours.

只要您自更新表格以来,只要没有合并,此方法就应该起作用.

This should work as long as no coalesce has occurred since you have updated the table.

您可以通过调用tables.get获取表中的行数,这通常比运行查询更简单,更快.

You can get the number of rows in the table by calling tables.get, which is usually simpler and faster than running a query.

这篇关于定期更新的bigquery数据库中的记录顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆