将Google Cloud Storage数据加载到bigtable中 [英] Load Google Cloud Storage data into bigtable

查看:72
本文介绍了将Google Cloud Storage数据加载到bigtable中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有简单的方法或示例将Google Cloud Storage数据加载到bigtable?

Is there an easy way or example to load Google Cloud Storage data into bigtable?

我有很多由pyspark生成的json文件,我希望将数据加载到bigtable中.

I have lots of json files generated by pyspark and i wish to load data into bigtable.

但是我找不到简单的方法!

But I can not find an easy way to do that!

我已经尝试过 google-cloud-python 并且工作正常,但它只是将数据逐行读入bigtable中,这对我来说很奇怪.

I have tried the python code from google-cloud-python and it work fined, but it just read data line by line into bigtable which was strange for me.

任何帮助将不胜感激.

推荐答案

在Cloud Bigtable中没有简单的工具可以读取数据.以下是一些选项:

There is no simple tool to read data in Cloud Bigtable. Here are some options:

  1. 使用数据流导入文件.这需要Java开发,并需要学习Dataflow编程模型.
  2. 使用Python(可能与Pyspark一起)读取这些json文件,并使用称为
  1. Import the files using Dataflow. This requires java development, and learning the Dataflow programming model.
  2. Use Python (possibly with Pyspark) to read those json files, and write to Cloud Bigtable using a method called mutate_rows which write to Bigtable in bulk.

仅供参考,我在Cloud Bigtable团队工作.我是Java开发人员,因此我选择了#1.我们的团队一直在努力改善我们的python体验.扩展的团队最近增加了一些可靠性改进,以确保mutate_rows对大型作业具有弹性.我们尚无与PySpark或 Apache Beam的python SDK 集成的好例子. ,但它们在我们的雷达范围内.

FYI, I work on the Cloud Bigtable team. I'm a Java developer, so I opt for #1. Our team has been working to improve our python experience. The extended team recently added some reliability improvements to make sure that mutate_rows is resilient for large jobs. We do not yet have any good examples of integrating with PySpark or Apache Beam's python SDK, but they are on our radar.

这篇关于将Google Cloud Storage数据加载到bigtable中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆