如何将实木复合地板文件复制并转换为csv [英] How to copy and convert parquet files to csv

查看：87 发布时间：2020/9/4 7:01:07 python hadoop apache-spark pyspark parquet

本文介绍了如何将实木复合地板文件复制并转换为csv的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以访问hdfs文件系统，并且可以看到镶木地板文件

I have access to a hdfs file system and can see parquet files with

hadoop fs -ls /user/foo

如何将这些拼花地板文件复制到本地系统并将其转换为csv，以便可以使用它们?这些文件应该是简单的文本文件，每行具有多个字段.

How can I copy those parquet files to my local system and convert them to csv so I can use them? The files should be simple text files with a number of fields per row.

推荐答案

尝试

df = spark.read.parquet("/path/to/infile.parquet")
df.write.csv("/path/to/outfile.csv")

相关的API文档:

pyspark.sql.DataFrameReader.parquet
pyspark.sql.DataFrameWriter.csv

/path/to/infile.parquet和/path/to/outfile.csv都应在hdfs文件系统上.您可以显式指定hdfs://...，也可以忽略它，因为通常它是默认方案.

Both /path/to/infile.parquet and /path/to/outfile.csv should be locations on the hdfs filesystem. You can specify hdfs://... explicitly or you can omit it as usually it is the default scheme.

您应该避免使用file://...，因为本地文件对群集中的每台计算机而言意味着不同的文件.而是输出到HDFS，然后使用命令行将结果传输到本地磁盘:

You should avoid using file://..., because a local file means a different file to every machine in the cluster. Output to HDFS instead then transfer the results to your local disk using the command line:

hdfs dfs -get /path/to/outfile.csv /path/to/localfile.csv

或直接从HDFS中显示它:

Or display it directly from HDFS:

hdfs dfs -cat /path/to/outfile.csv

这篇关于如何将实木复合地板文件复制并转换为csv的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将实木复合地板文件复制并转换为csv [英] How to copy and convert parquet files to csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将实木复合地板文件复制并转换为csv [英] How to copy and convert parquet files to csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭