如何在没有火花的情况下从 S3 读取 Parquet 文件?爪哇 [英] How to read Parquet file from S3 without spark? Java
问题描述
目前,我使用 Apache ParquetReader 来读取本地镶木地板文件,看起来像这样:
Currently, I am using the Apache ParquetReader for reading local parquet files, which looks something like this:
ParquetReader<GenericData.Record> reader = null;
Path path = new Path("userdata1.parquet");
try {
reader = AvroParquetReader.<GenericData.Record>builder(path).withConf(new Configuration()).build();
GenericData.Record record;
while ((record = reader.read()) != null) {
System.out.println(record);
但是,我试图通过 S3 访问镶木地板文件而不下载它.有没有办法直接用 parquet reader 解析 Inputstream?
However, I am trying to access a parquet file through S3 without downloading it. Is there a way to parse Inputstream directly with parquet reader?
推荐答案
是的,最新版本的 hadoop 包括对 S3 文件系统的支持.使用 hadoop-aws
库中的 s3a
客户端直接访问 S3 文件系统.
Yes, the latest versions of hadoop include support for S3 filesystem. Use the s3a
client from hadoop-aws
library to directly access the S3 filesystem.
HadoopInputFile
路径应构造为 s3a://bucket-name/prefix/key
以及身份验证凭据 access_key
和 secret_key
使用属性配置
The HadoopInputFile
Path should be constructed as s3a://bucket-name/prefix/key
along with the authentication credentials access_key
and secret_key
configured using the properties
fs.s3a.access.key
fs.s3a.secret.key
此外,您还需要这些依赖库
Additionally, you would require these dependant libraries
hadoop-common
JARaws-java-sdk-bundle
JAR
hadoop-common
JARaws-java-sdk-bundle
JAR
阅读更多:相关配置属性
这篇关于如何在没有火花的情况下从 S3 读取 Parquet 文件?爪哇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!