在Spark中将DataFrame转换为Json数组 [英] DataFrame to Json Array in Spark

查看:171
本文介绍了在Spark中将DataFrame转换为Json数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Java编写Spark应用程序,该应用程序读取HiveTable并将输出以Json格式存储在HDFS中.

I am writing Spark Application in Java which reads the HiveTable and store the output in HDFS as Json Format.

我使用HiveContext读取配置单元表,它返回DataFrame.下面是代码段.

I read the hive table using HiveContext and it returns the DataFrame. Below is the code snippet.

 SparkConf conf = new SparkConf().setAppName("App");
 JavaSparkContext sc = new JavaSparkContext(conf);
 HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);

DataFrame data1= hiveContext.sql("select * from tableName")

现在我想将DataFrame转换为JsonArray.例如,data1数据如下所示

Now I want to convert DataFrame to JsonArray. For Example, data1 data looks like below

|  A  |     B     |
-------------------
|  1  | test      |
|  2  | mytest    |

我需要类似下面的输出

[{1:"test"},{2:"mytest"}]

我尝试使用data1.schema.json(),它给了我类似下面的输出,而不是数组.

I tried using data1.schema.json() and it gives me the output like below, not an Array.

{1:"test"}
{2:"mytest"}

在不使用任何第三方库的情况下将DataFrame转换为jsonArray的正确方法或功能是什么?

What is the right approach or function to convert the DataFrame to jsonArray without using any third Party libraries.

推荐答案

data1.schema.json将为您提供一个JSON字符串,其中包含数据框的架构,而不是实际数据本身.您会得到:

data1.schema.json will give you a JSON string containing the schema of the dataframe and not the actual data itself. You will get :

String = {"type":"struct",
          "fields":
                  [{"name":"A","type":"integer","nullable":false,"metadata":{}},
                  {"name":"B","type":"string","nullable":true,"metadata":{}}]}

要将数据帧转换为JSON数组,您需要使用DataFrame的toJSON方法:

To convert your dataframe to array of JSON, you need to use toJSON method of DataFrame:

val df = sc.parallelize(Array( (1, "test"), (2, "mytest") )).toDF("A", "B")
df.show()

+---+------+
|  A|     B|
+---+------+
|  1|  test|
|  2|mytest|
+---+------+

df.toJSON.collect.mkString("[", "," , "]" )
String = [{"A":1,"B":"test"},{"A":2,"B":"mytest"}]

这篇关于在Spark中将DataFrame转换为Json数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆