从Java的Web来源将JSON导入Apache Spark [英] Get JSON into Apache Spark from a web source in Java

查看:115
本文介绍了从Java的Web来源将JSON导入Apache Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Web服务器,该服务器返回要加载到Apache Spark DataFrame中的JSON数据.现在,我有一个shell脚本,该脚本使用wget将JSON数据写入文件,然后运行一个类似于以下内容的Java程序:

I have a web server which returns JSON data that I would like to load into an Apache Spark DataFrame. Right now I have a shell script that uses wget to write the JSON data to file and then runs a Java program that looks something like this:

DataFrame df = sqlContext.read().json("example.json");

我看过Apache Spark文档,似乎没有一种方法可以自动将这两个步骤结合在一起.必须有一种在Java中请求JSON数据,将其存储为对象然后将其转换为DataFrame的方法,但是我一直无法弄清楚.有人可以帮忙吗?

I have looked at the Apache Spark documentation and there doesn't seem a way to automatically join these two steps together. There must be a way of requesting JSON data in Java, storing it as an object and then converting it to a DataFrame, but I haven't been able to figure it out. Can anyone help?

推荐答案

您可以将JSON数据存储到字符串列表中,例如:

You could store JSON data into a list of Strings like:

final String JSON_STR0 = "{\"name\":\"0\",\"address\":{\"city\":\"0\",\"region\":\"0\"}}";
final String JSON_STR1 = "{\"name\":\"1\",\"address\":{\"city\":\"1\",\"region\":\"1\"}}";
List<String> jsons = Arrays.asList(JSON_STR0, JSON_STR1);

每个String代表一个JSON对象.

where each String represents a JSON object.

然后您可以将列表转换为RDD:

Then you could transform the list to an RDD:

RDD<String> jsonRDD = sc.parallelize(jsons);

一旦有了RDD,就很容易拥有DataFrame:

Once you've got RDD, it's easy to have DataFrame:

DataFrame data = sqlContext.read().json(jsonRDD);

这篇关于从Java的Web来源将JSON导入Apache Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆