google-cloud-dataflow:如何从数据库读取数据并写入BigQuery [英] google-cloud-dataflow : How to read data from a Database and write to BigQuery
问题描述
我需要从一些源数据库(例如Oracle,MySQL)建立数据管道,然后将数据加载到BigQuery.
I need to setup a data pipeline from some source databases like Oracle, MySQL and load the data to BigQuery.
如何使用google-cloud-dataflow从数据库(jdbc连接)读取数据并使用Python写入BigQuery表.
How can I use google-cloud-dataflow to read data from a database(jdbc connection) and write to BigQuery tables using Python.
此外,我在本地Hadoop集群中有一些配置单元表,如何将这些数据传输到BigQuery.
Also, I have some hive tables in an on-premise Hadoop cluster, how do I transfer this data to BigQuery.
我找不到合适的文档或示例来实现这一目标. 能给我指出正确的方向吗?
I couldn't find the right documentation or examples to achieve this. Can you please point me in the right direction.
推荐答案
我在项目中应用了一个解决方案来提供这种东西,您需要遵循以下步骤:
I applied a solution in my project to provide such thing, you need to follow these steps: