从Java的Web来源将JSON导入Apache Spark [英] Get JSON into Apache Spark from a web source in Java
问题描述
我有一个Web服务器,该服务器返回要加载到Apache Spark DataFrame中的JSON数据.现在,我有一个shell脚本,该脚本使用wget将JSON数据写入文件,然后运行一个类似于以下内容的Java程序:
I have a web server which returns JSON data that I would like to load into an Apache Spark DataFrame. Right now I have a shell script that uses wget to write the JSON data to file and then runs a Java program that looks something like this:
DataFrame df = sqlContext.read().json("example.json");
我看过Apache Spark文档,似乎没有一种方法可以自动将这两个步骤结合在一起.必须有一种在Java中请求JSON数据,将其存储为对象然后将其转换为DataFrame的方法,但是我一直无法弄清楚.有人可以帮忙吗?
I have looked at the Apache Spark documentation and there doesn't seem a way to automatically join these two steps together. There must be a way of requesting JSON data in Java, storing it as an object and then converting it to a DataFrame, but I haven't been able to figure it out. Can anyone help?
推荐答案
您可以将JSON数据存储到字符串列表中,例如:
You could store JSON data into a list of Strings like:
final String JSON_STR0 = "{\"name\":\"0\",\"address\":{\"city\":\"0\",\"region\":\"0\"}}";
final String JSON_STR1 = "{\"name\":\"1\",\"address\":{\"city\":\"1\",\"region\":\"1\"}}";
List<String> jsons = Arrays.asList(JSON_STR0, JSON_STR1);
每个String代表一个JSON对象.
where each String represents a JSON object.
然后您可以将列表转换为RDD:
Then you could transform the list to an RDD:
RDD<String> jsonRDD = sc.parallelize(jsons);
一旦有了RDD,就很容易拥有DataFrame:
Once you've got RDD, it's easy to have DataFrame:
DataFrame data = sqlContext.read().json(jsonRDD);
这篇关于从Java的Web来源将JSON导入Apache Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!