如何使用另一个数据帧标题更改数据帧的标题? [英] how to change header of a data frame with another data frame header?

查看:23
本文介绍了如何使用另一个数据帧标题更改数据帧的标题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集

I have a data set which looks like this

LineItem.organizationId|^|LineItem.lineItemId|^|StatementTypeCode|^|LineItemName|^|LocalLanguageLabel|^|FinancialConceptLocal|^|FinancialConceptGlobal|^|IsDimensional|^|InstrumentId|^|LineItemSequence|^|PhysicalMeasureId|^|FinancialConceptCodeGlobalSecondary|^|IsRangeAllowed|^|IsSegmentedByOrigin|^|SegmentGroupDescription|^|SegmentChildDescription|^|SegmentChildLocalLanguageLabel|^|LocalLanguageLabel.languageId|^|LineItemName.languageId|^|SegmentChildDescription.languageId|^|SegmentChildLocalLanguageLabel.languageId|^|SegmentGroupDescription.languageId|^|SegmentMultipleFundbDescription|^|SegmentMultipleFundbDescription.languageId|^|IsCredit|^|FinancialConceptLocalId|^|FinancialConceptGlobalId|^|FinancialConceptCodeGlobalSecondaryId|^|FFAction|!|
Japan|^|1507101869432|^|4295876606|^|1|^|BAL|^|Cash And Deposits|^|null|^|null|^|ACAE|^|false|^|null|^|null|^|null|^|null|^|false|^|null|^|null|^|null|^|null|^|505126|^|505074|^|null|^|null|^|null|^|null|^|null|^|null|^|null|^|3018759|^|null|^|I|!|

这就是我如何使用自动发现模式加载数据

And this is how i load data with auto discover schema

val df1With_ = df.toDF(df.columns.map(_.replace(".", "_")): _*)
val column_to_keep = df1With_.columns.filter(v => (!v.contains("^") && !v.contains("!") && !v.contains("_c"))).toSeq
val df1result = df1With_.select(column_to_keep.head, column_to_keep.tail: _*)

现在我有另一个数据框,我可以在其中进行连接操作,最后我创建了一个将输出写入 csv 文件的数据框.

Now i have another data frame on which i do join operation and finally i create a data frame which writes output to csv file .

最终的数据框看起来像这样

Final data frame looks like this

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

val dfMainOutputFinalWithoutNull = dfMainOutputFinal.withColumn("concatenated", regexp_replace(col("concatenated"), "null", ""))

dfMainOutputFinalWithoutNull.write.partitionBy("DataPartition","StatementTypeCode")
  .format("csv")
  .option("nullValue", "")
  .option("header","true")
  .option("codec", "gzip")
  .save("output")

现在在我的输出文件中,我看到我的标题只有 concatenated 这是预期的.

Now in my output file i see my header as only concatenated which is expected .

现在我的问题是无论如何改变我最终输出的标题作为 df1result 数据框的标题

Now my question is is there anyway to change header of my final output as header of df1result data frame

推荐答案

我相信解决这个问题的最简单方法是重命名 concatenated 列.由于列名称已经存在于 column_to_keep 变量中,您可以简单地执行:

I believe the simplest way to solve this would be to rename the concatenated column. As the column names already exists in the column_to_keep variable, you can simply do:

val header = column_to_keep.mkString("|^|")
val dfMainOutputFinalWithoutNull = dfMainOutputFinal
  .withColumn("concatenated", regexp_replace(col("concatenated"), "null", ""))
  .withColumnRenamed("concatenated", header)

这将导致列名非常长,因此,如果不是保存到 csv 中,我不建议这样做.

This will result is an extremely long column name, hence, I wouldn't advice it if it was for something else than saving to a csv.

这篇关于如何使用另一个数据帧标题更改数据帧的标题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆