将Google Dataproc中的配置单元表中的数据移至BigQuery [英] Move data from hive tables in Google Dataproc to BigQuery
问题描述
我们正在使用Google Dataproc进行数据转换,而我们所有的数据都位于Dataproc Hive表格中。如何将这些数据传输/移动到BigQuery中。
从Hive传输到BigQuery似乎有一个标准模式: p>
- 将您的Hive转储到Avro文件中
- 在BigQuery中加载这些文件
请参阅此处的示例:将配置单元表迁移到Google BigQuery
如上所述,请注意Hive / Avro / BigQuery之间的类型兼容性。
我首次猜测,通过比较Hive和BigQuery上的表具有相同的数据来进行验证并不会有什么影响: https://github.com/bolcom/hive_compared_bq
We are doing the data transformations using Google Dataproc and all our data is residing in Dataproc Hive tables. How do i transfer/move this data to BigQuery.
Transfer to BigQuery from Hive seems to have a standard pattern:
- dump your Hive into Avro files
- Load those files in BigQuery
See an example here: Migrate hive table to Google BigQuery
As mentioned above, take care about the types compatibility between Hive/Avro/BigQuery.
And for the first time I guess it would not hurt to do some validations by comparing that the tables on both Hive and BigQuery have the same data: https://github.com/bolcom/hive_compared_bq
这篇关于将Google Dataproc中的配置单元表中的数据移至BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!