不使用 Spark 从 Parquet 读取特定列 [英] Read specific column from Parquet without using Spark

查看：139 发布时间：2021/6/14 19:23:22 scala parquet

本文介绍了不使用 Spark 从 Parquet 读取特定列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在不使用 Apache Spark 的情况下读取 Parquet 文件，我能够做到，但我发现很难读取特定的列.我找不到任何好的谷歌资源，因为几乎所有的帖子都是关于阅读镶木地板文件的.下面是我的代码:

I am trying to read Parquet files without using Apache Spark and I am able to do it but I am finding it hard to read specific columns. I am not able to find any good resource of Google as almost all the post is about reading the parquet file using. Below is my code:

import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.avro.generic.GenericRecord
import org.apache.parquet.hadoop.ParquetReader
import org.apache.parquet.avro.AvroParquetReader

object parquetToJson{
  def main (args : Array[String]):Unit= {
 //case class Customer(key: Int, name: String, sellAmount: Double, profit: Double, state:String)
val parquetFilePath = new Path("data/parquet/Customer/")
val reader = AvroParquetReader.builder[GenericRecord](parquetFilePath).build()//.asInstanceOf[ParquetReader[GenericRecord]]
val iter = Iterator.continually(reader.read).takeWhile(_ != null)
val list = iter.toList
list.foreach(record => println(record))
}
}

注释掉的 case 类代表我的文件的架构，现在编写上面的代码从文件中读取所有列.我想阅读特定的专栏.

The commented out case class represents the schema of my file and write now the above code reads all the columns from the file. I want to read specific columns.

不使用 Spark 从 Parquet 读取特定列 [英] Read specific column from Parquet without using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

不使用 Spark 从 Parquet 读取特定列 [英] Read specific column from Parquet without using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭