我可以使用spark-csv在Apache Spark中读取以字符串形式表示的CSV吗 [英] Can I read a CSV represented as a string into Apache Spark using spark-csv

查看:80
本文介绍了我可以使用spark-csv在Apache Spark中读取以字符串形式表示的CSV吗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何使用spark-csv( https://github.com/databricks/spark-csv ),但是我已经将csv文件表示为字符串,并且希望将此字符串直接转换为dataframe.这可能吗?

I know how to read a csv file into spark using spark-csv (https://github.com/databricks/spark-csv), but I already have the csv file represented as a string and would like to convert this string directly to dataframe. Is this possible?

推荐答案

更新:从Spark 2.2.x开始 终于有了使用数据集的正确方法.

Update : Starting from Spark 2.2.x there is finally a proper way to do it using Dataset.

import org.apache.spark.sql.{Dataset, SparkSession}
val spark = SparkSession.builder().appName("CsvExample").master("local").getOrCreate()

import spark.implicits._
val csvData: Dataset[String] = spark.sparkContext.parallelize(
  """
    |id, date, timedump
    |1, "2014/01/01 23:00:01",1499959917383
    |2, "2014/11/31 12:40:32",1198138008843
  """.stripMargin.lines.toList).toDS()

val frame = spark.read.option("header", true).option("inferSchema",true).csv(csvData)
frame.show()
frame.printSchema()

旧版Spark

实际上,您可以,尽管它使用的是库内部结构,但并未广为宣传.只需创建并使用自己的CsvParser实例即可. 在下面的spark 1.6.0和spark-csv_2.10-1.4.0上适用于我的示例

Actually you can, though it's using library internals and not widely advertised. Just create and use your own CsvParser instance. Example that works for me on spark 1.6.0 and spark-csv_2.10-1.4.0 below

    import com.databricks.spark.csv.CsvParser

val csvData = """
|userid,organizationid,userfirstname,usermiddlename,userlastname,usertitle
|1,1,user1,m1,l1,mr
|2,2,user2,m2,l2,mr
|3,3,user3,m3,l3,mr
|""".stripMargin
val rdd = sc.parallelize(csvData.lines.toList)
val csvParser = new CsvParser()
  .withUseHeader(true)
  .withInferSchema(true)


val csvDataFrame: DataFrame = csvParser.csvRdd(sqlContext, rdd)

这篇关于我可以使用spark-csv在Apache Spark中读取以字符串形式表示的CSV吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆