spark-sql 是否支持输入数据中的多个分隔符? [英] Does spark-sql support multiple delimiters in the input data?

查看：152 发布时间：2021/11/14 22:06:24 apache-spark apache-spark-sql

本文介绍了spark-sql 是否支持输入数据中的多个分隔符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有多个单字符分隔符的输入数据，如下所示:

I have an input data with multiple single character delimiters as followed :

col1data1"col2data1;col3data1"col4data1
col1data2"col2data2;col3data2"col4data2
col1data3"col2data3;col3data3"col4data3

在上面的数据中 ["] ,[;] 是我的分隔符.

In the above data the ["] ,[;] are my delimiters.

sparkSQL 是否有任何方法可以将输入数据(位于文件中)直接转换为具有列名 col1、col2、col3、col4 的表

Is there any way in sparkSQL to convert directly the input data( which is in a file) into a table with column names col1,col2,col3,col4

推荐答案

答案是否，spark-sql 不支持多分隔符，但一种方法是尝试读取它将文件转换为 RDD，然后使用常规拆分方法对其进行解析:

The answer is no, spark-sql does not support multi-delimiter but one way to do it is trying to read it your file into an RDD and than parse it using regular splitting methods :

val rdd : RDD[String] = ???
val s = rdd.first()
// res1: String = "This is one example. This is another"

假设您想在空间上分割并点断点.

Let's say that you want to split on space and point break.

所以我们可以考虑将我们的函数应用到我们的 s 值上，如下所示:

so we can consider apply our function on our s value as followed :

s.split(" |\\.")
// res2: Array[String] = Array(This, is, one, example, "", This, is, another)

现在我们可以在整个rdd上应用这个函数:

now we can apply the function on the whole rdd :

rdd.map(_.split(" |\\."))

数据示例:

scala> val s = "col1data1\"col2data1;col3data1\"col4data1"
scala> s.split(";|\"")
res4: Array[String] = Array(col1data1, col2data1, col3data1, col4data1)

更多关于字符串分割的内容:

spark-sql 是否支持输入数据中的多个分隔符? [英] Does spark-sql support multiple delimiters in the input data?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark-sql 是否支持输入数据中的多个分隔符? [英] Does spark-sql support multiple delimiters in the input data?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭