如何将500GB的SQL表转换为Apache Parquet? [英] How to convert an 500GB SQL table into Apache Parquet?

查看:63
本文介绍了如何将500GB的SQL表转换为Apache Parquet?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许这有据可查,但是我却很困惑如何执行此操作(有许多Apache工具).

Perhaps this is well documented, but I am getting very confused how to do this (there are many Apache tools).

创建SQL表时,使用以下命令创建表:

When I create an SQL table, I create the table using the following commands:

CREATE TABLE table_name(
   column1 datatype,
   column2 datatype,
   column3 datatype,
   .....
   columnN datatype,
   PRIMARY KEY( one or more columns )
);

如何将这个存在的表转换为Parquet?该文件写入磁盘了吗?如果原始数据为几GB,则必须等待多长时间?

How does one convert this exist table into Parquet? This file is written to disk? If the original data is several GB, how long does one have to wait?

我可以将原始原始数据格式化为Parquet格式吗?

Could I format the original raw data into Parquet format instead?

推荐答案

Apache Spark可用于执行此操作:

Apache Spark can be used to do this:

1.load your table from mysql via jdbc
2.save it as a parquet file

示例:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.jdbc("YOUR_MYSQL_JDBC_CONN_STRING",  "YOUR_TABLE",properties={"user": "YOUR_USER", "password": "YOUR_PASSWORD"})
df.write.parquet("YOUR_HDFS_FILE")

这篇关于如何将500GB的SQL表转换为Apache Parquet?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆