Pyspark - 将 json 字符串转换为 DataFrame [英] Pyspark - converting json string to DataFrame

查看：91 发布时间：2021/12/22 21:28:48 python apache-spark pyspark jupyter-notebook

本文介绍了Pyspark - 将 json 字符串转换为 DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含简单 json 的 test2.json 文件:

I have a test2.json file that contains simple json:

{  "Name": "something",  "Url": "https://stackoverflow.com",  "Author": "jangcy",  "BlogEntries": 100,  "Caller": "jangcy"}

我已将文件上传到 blob 存储并从中创建了一个 DataFrame:

I have uploaded my file to blob storage and I create a DataFrame from it:

df = spark.read.json("/example/data/test2.json")

然后我可以毫无问题地看到它:

then I can see it without any problems:

df.show()
+------+-----------+------+---------+--------------------+
|Author|BlogEntries|Caller|     Name|                 Url|
+------+-----------+------+---------+--------------------+
|jangcy|        100|jangcy|something|https://stackover...|
+------+-----------+------+---------+--------------------+

第二种情况:我在我的笔记本中声明了相同的 json 字符串:

Second scenario: I have really the same json string declared within my notebook:

newJson = '{  "Name": "something",  "Url": "https://stackoverflow.com",  "Author": "jangcy",  "BlogEntries": 100,  "Caller": "jangcy"}'

我可以打印它等等.但是现在如果我想从中创建一个 DataFrame:

I can print it etc. But now if I'd like to create a DataFrame from it:

df = spark.read.json(newJson)

我收到绝对 URI 中的相对路径"错误:

I get the 'Relative path in absolute URI' error:

'java.net.URISyntaxException: Relative path in absolute URI: {  "Name":%20%22something%22,%20%20%22Url%22:%20%22https:/stackoverflow.com%22,%20%20%22Author%22:%20%22jangcy%22,%20%20%22BlogEntries%22:%20100,%20%20%22Caller%22:%20%22jangcy%22%7D'
Traceback (most recent call last):
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 249, in json
    return self._df(self._jreader.json(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'java.net.URISyntaxException: Relative path in absolute URI: {  "Name":%20%22something%22,%20%20%22Url%22:%20%22https:/stackoverflow.com%22,%20%20%22Author%22:%20%22jangcy%22,%20%20%22BlogEntries%22:%20100,%20%20%22Caller%22:%20%22jangcy%22%7D'

我应该对 newJson 字符串应用额外的转换吗?如果是，它们应该是什么?如果这太琐碎，请原谅我，因为我对 Python 和 Spark 非常陌生.

Should I apply additional transformations to the newJson string? If yes, what should them be? Please forgive me, if this is too trivial, as I am very new to Python and Spark.

我正在使用带有 PySpark3 内核的 Jupyter 笔记本.

I am using Jupyter notebook with PySpark3 Kernel.

提前致谢.

Pyspark - 将 json 字符串转换为 DataFrame [英] Pyspark - converting json string to DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark - 将 json 字符串转换为 DataFrame [英] Pyspark - converting json string to DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭