有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名? [英] Is there a way to get the column names by order from a JSON file in SparkSQL ?
问题描述
我有一个 JSON 文件,在加载到 Spark SQL 时,键将是我的列.现在当我想检索列名时,它是按字母顺序检索的.但我希望细节应该按照文件中的顺序
I have a JSON file and the keys would be my column while loading into Spark SQL. Now when i want to retrieve the column names, it was retrieved in Alphabetical order. But i want the details should be in the order of how its present in the file
我的输入数据是
{"id":1,"name":"Judith","email":"jknight0@google.co.uk","city":"Évry","country":"France","ip":"199.63.123.157"}
下面是我检索列名并构建单个字符串的方法
Below is my way to retrieve the column names and build a single string
val dataframe = sqlContext.read.json("/virtual/home/587635/users.json")
val columns = dataframe.columns
var query = columns.apply(0)+" STRING"
for (a <- 1 to (columns.length-1))
{
query = query + ","+ columns.apply(a) + " STRING"
}
println(query)
这给了我如下的输出
city STRING,country STRING,email STRING,id STRING,ip STRING,name STRING
但我希望我的输出为
id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING
推荐答案
添加一个 select
并正确排列列
Add a select
with the columns correctly ordered
val dataframe =
sqlContext
.read
.json("/tmp/test.jsn")
.select("id", "name", "email", "city", "country", "ip")
如果你在 shell 上试过这个,你会注意到正确的顺序
If you tried this at the shell, you will notice the correct order
数据帧:org.apache.spark.sql.DataFrame = [id: bigint, name: string,电子邮件:字符串,城市:字符串,国家:字符串,ip:字符串]
dataframe: org.apache.spark.sql.DataFrame = [id: bigint, name: string, email: string, city: string, country: string, ip: string]
通过执行脚本的其余部分,输出符合预期
By executing the rest of your script, the output is as expected
id STRING,name STRING,email STRING,city STRING,country STRING,ipSTRING
id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING
这篇关于有没有办法从 SparkSQL 中的 JSON 文件中按顺序获取列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!