如何从给定字段长度的固定长度文本文件创建 DataFrame? [英] How to create DataFrame from fixed-length text file given field lengths?

查看：38 发布时间：2021/11/14 22:13:06 scala apache-spark apache-spark-sql

本文介绍了如何从给定字段长度的固定长度文本文件创建 DataFrame?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在读取固定位置文件.文件的最终结果存储在字符串中.我想将字符串转换为 DataFrame 以进一步处理.请帮我解决这个问题.下面是我的代码:

I am reading fixed positional file. Final result of file is stored in string. I would like to convert string into a DataFrame to process further. Kindly help me on this. Below is my code:

输入数据:+---------+----------------------+

Input data: +---------+----------------------+

|PRGREFNBR|值|

|PRGREFNBR|value |

+---------+----------------------+

|01 |11 苹果 TRUE 0.56|

|01 |11 apple TRUE 0.56|

|02 |12 梨 FALSE1.34|

|02 |12 pear FALSE1.34|

|03 |13 覆盆子 TRUE 2.43|

|03 |13 raspberry TRUE 2.43|

|04 |14 梅花 TRUE .31|

|04 |14 plum TRUE .31|

|05 |15 樱桃 TRUE 1.4 |

|05 |15 cherry TRUE 1.4 |

+---------+----------------------+

数据位置:"3,10,5,4"

数据框中默认标头的预期结果:

expected result with default header in data frame:

+-----+-----+------------+-----+-----+

+-----+-----+----------+-----+-----+

|SeqNo|col_0|col_1|col_2|col_3|

|SeqNo|col_0| col_1|col_2|col_3|

+-----+-----+------------+-----+-----+

+-----+-----+----------+-----+-----+

|01 |11 |苹果 |真 |0.56|

| 01 | 11 |apple |TRUE | 0.56|

|02 |12 |梨 |假|1.34|

| 02 | 12 |pear |FALSE| 1.34|

|03 |13 |覆盆子 |真 |2.43|

| 03 | 13 |raspberry |TRUE | 2.43|

|04 |14 |李子 |真 |1.31|

| 04 | 14 |plum |TRUE | 1.31|

|05 |15 |樱桃 |真 |1.4 |

| 05 | 15 |cherry |TRUE | 1.4 |

+-----+-----+------------+-----+-----+

+-----+-----+----------+-----+-----+

推荐答案

给定固定位置文件(比如 input.txt):

Given the fixed-position file (say input.txt):

11 apple     TRUE 0.56

12 pear      FALSE1.34 

13 raspberry TRUE 2.43 

14 plum      TRUE 1.31 

15 cherry    TRUE 1.4

以及输入文件中每个字段的长度(比如lengths):

and the length of every field in the input file as (say lengths):

3,10,5,4

您可以按如下方式创建 DataFrame:

you could create a DataFrame as follows:

// Read the text file as is
// and filter out empty lines
val lines = spark.read.textFile("input.txt").filter(!_.isEmpty)

// define a helper function to do the split per fixed lengths
// Home exercise: should be part of a case class that describes the schema
def parseLinePerFixedLengths(line: String, lengths: Seq[Int]): Seq[String] = {
  lengths.indices.foldLeft((line, Array.empty[String])) { case ((rem, fields), idx) =>
    val len = lengths(idx)
    val fld = rem.take(len)
    (rem.drop(len), fields :+ fld)
  }._2
}

// Split the lines using parseLinePerFixedLengths method
val lengths = Seq(3,10,5,4)
val fields = lines.
  map(parseLinePerFixedLengths(_, lengths)).
  withColumnRenamed("value", "fields") // <-- it'd be unnecessary if a case class were used
scala> fields.show(truncate = false)
+------------------------------+
|fields                        |
+------------------------------+
|[11 , apple     , TRUE , 0.56]|
|[12 , pear      , FALSE, 1.34]|
|[13 , raspberry , TRUE , 2.43]|
|[14 , plum      , TRUE , 1.31]|
|[15 , cherry    , TRUE , 1.4 ]|
+------------------------------+

这就是您可能已经拥有的，所以让我们将嵌套的字段序列展开/分解为列

That's what you may have had already so let's unroll/destructure the nested sequence of fields into columns

val answer = lengths.indices.foldLeft(fields) { case (result, idx) =>
  result.withColumn(s"col_$idx", $"fields".getItem(idx))
}
// drop the unnecessary/interim column
scala> answer.drop("fields").show
+-----+----------+-----+-----+
|col_0|     col_1|col_2|col_3|
+-----+----------+-----+-----+
|  11 |apple     |TRUE | 0.56|
|  12 |pear      |FALSE| 1.34|
|  13 |raspberry |TRUE | 2.43|
|  14 |plum      |TRUE | 1.31|
|  15 |cherry    |TRUE | 1.4 |
+-----+----------+-----+-----+

完成！

这篇关于如何从给定字段长度的固定长度文本文件创建 DataFrame?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从给定字段长度的固定长度文本文件创建 DataFrame? [英] How to create DataFrame from fixed-length text file given field lengths?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从给定字段长度的固定长度文本文件创建 DataFrame? [英] How to create DataFrame from fixed-length text file given field lengths?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭