获取 WrappedArray 行值并将其转换为 Scala 中的字符串 [英] Get WrappedArray row valule and convert it into string in Scala

查看:57
本文介绍了获取 WrappedArray 行值并将其转换为 Scala 中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a data frame which comes as like below

+---------------------------------------------------------------------+
|value                                                                |
+---------------------------------------------------------------------+
|[WrappedArray(LineItem_organizationId, LineItem_lineItemId)]         |
|[WrappedArray(OrganizationId, LineItemId, SegmentSequence_segmentId)]|
+---------------------------------------------------------------------+

从上面两行我想创建一个这种格式的字符串

From the above two rows I want to create a string which is in this format

"LineItem_organizationId", "LineItem_lineItemId"
"OrganizationId", "LineItemId", "SegmentSequence_segmentId"

我想将其创建为动态的,因此在第一列中存在第三个值,我的字符串将再有一个分隔列值.

I want to create this as dynamic so in first column third value is present my string will have one more , separated columns value .

我怎样才能在 Scala 中做到这一点.

How can I do this in Scala .

这就是我为了创建数据框所做的

this is what I am doing in order to create data frame

 val xmlFiles = "C://Users//u6034690//Desktop//SPARK//trfsmallfffile//XML"
    val discriptorFileLOcation = "C://Users//u6034690//Desktop//SPARK//trfsmallfffile//FinancialLineItem//REFXML"
    import sqlContext.implicits._

    val dfDiscriptor = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "FlatFileDescriptor").load(discriptorFileLOcation)
    dfDiscriptor.printSchema()
    val firstColumn = dfDiscriptor.select($"FFFileType.FFRecord.FFField").as("FFField")
    val FirstColumnOfHeaderFile = firstColumn.select(explode($"FFField")).as("ColumnsDetails").select(explode($"col")).first.get(0).toString().split(",")(5)
    println(FirstColumnOfHeaderFile)
    //dfDiscriptor.printSchema()
    val primaryKeyColumnsFinancialLineItem = dfDiscriptor.select(explode($"FFFileType.FFRecord.FFPrimKey.FFPrimKeyCol"))
    primaryKeyColumnsFinancialLineItem.show(false)

添加完整架构

   root
 |-- FFColumnDelimiter: string (nullable = true)
 |-- FFContentItem: struct (nullable = true)
 |    |-- _VALUE: string (nullable = true)
 |    |-- _ffMajVers: long (nullable = true)
 |    |-- _ffMinVers: double (nullable = true)
 |-- FFFileEncoding: string (nullable = true)
 |-- FFFileType: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- FFPhysicalFile: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- FFFileName: string (nullable = true)
 |    |    |    |    |-- FFRowCount: long (nullable = true)
 |    |    |-- FFRecord: struct (nullable = true)
 |    |    |    |-- FFField: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- FFColumnNumber: long (nullable = true)
 |    |    |    |    |    |-- FFDataType: string (nullable = true)
 |    |    |    |    |    |-- FFFacets: struct (nullable = true)
 |    |    |    |    |    |    |-- FFMaxLength: long (nullable = true)
 |    |    |    |    |    |    |-- FFTotalDigits: long (nullable = true)
 |    |    |    |    |    |-- FFFieldIsOptional: boolean (nullable = true)
 |    |    |    |    |    |-- FFFieldName: string (nullable = true)
 |    |    |    |    |    |-- FFForKey: struct (nullable = true)
 |    |    |    |    |    |    |-- FFForKeyCol: string (nullable = true)
 |    |    |    |    |    |    |-- FFForKeyRecord: string (nullable = true)
 |    |    |    |-- FFPrimKey: struct (nullable = true)
 |    |    |    |    |-- FFPrimKeyCol: array (nullable = true)
 |    |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- FFRecordType: string (nullable = true)
 |-- FFHeaderRow: boolean (nullable = true)
 |-- FFId: string (nullable = true)
 |-- FFRowDelimiter: string (nullable = true)
 |-- FFTimeStamp: string (nullable = true)
 |-- _env: string (nullable = true)
 |-- _ffMajVers: long (nullable = true)
 |-- _ffMinVers: double (nullable = true)
 |-- _ffPubstyle: string (nullable = true)
 |-- _schemaLocation: string (nullable = true)
 |-- _sr: string (nullable = true)
 |-- _xmlns: string (nullable = true)
 |-- _xsi: string (nullable = true)

推荐答案

查看给定的 dataframe

+---------------------------------------------------------------------+
|value                                                                |
+---------------------------------------------------------------------+
|[WrappedArray(LineItem_organizationId, LineItem_lineItemId)]         |
|[WrappedArray(OrganizationId, LineItemId, SegmentSequence_segmentId)]|
+---------------------------------------------------------------------+

它必须具有以下schema

 |-- value: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)

如果上述假设成立,那么你应该编写一个 udf 函数作为

If the above assumption are true then you should write a udf function as

import org.apache.spark.sql.functions._
def arrayToString = udf((arr: collection.mutable.WrappedArray[collection.mutable.WrappedArray[String]]) => arr.flatten.mkString(", "))

并在 dataframe 中使用它作为

And use it in the dataframe as

df.withColumn("value", arrayToString($"value"))

你应该有

+-----------------------------------------------------+
|value                                                |
+-----------------------------------------------------+
|LineItem_organizationId, LineItem_lineItemId         |
|OrganizationId, LineItemId, SegmentSequence_segmentId|
+-----------------------------------------------------+

 |-- value: string (nullable = true)

这篇关于获取 WrappedArray 行值并将其转换为 Scala 中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆