将html源代码转换为json对象 [英] Convert html source code to json object
问题描述
我正在从一个网站获取许多页面的html源代码,我需要将其转换为json对象,并与json文档中的其他元素组合. .我见过很多关于同一主题的问题,但没有一个是有帮助的.
I am fetching html source code of many pages from one website, I need to convert it into json object and combine with other elements in json doc. . I have seen many questions on same topic but non of them were helpful.
我的代码:
url = "https://totalhash.cymru.com/analysis/?1ce201cf28c6dd738fd4e65da55242822111bd9f"
htmlContent = requests.get(url, verify=False)
data = htmlContent.text
print("data",data)
jsonD = json.dumps(htmlContent.text)
jsonL = json.loads(jsonD)
ContentUrl='{ \"url\" : \"'+str(urls)+'\" ,'+"\n"+' \"uid\" : \"'+str(uniqueID)+'\" ,\n\"page_content\" : \"'+jsonL+'\" , \n\"date\" : \"'+finalDate+'\"}'
上面的代码为我提供了unicode类型,但是,当我将输出放入jsonLint时,它给了我无效的json错误.有人可以帮助我了解如何将完整的html转换为json对象吗?
above code gives me unicode type, however, when I put that output in jsonLint it gives me invalid json error. Can somebody help me understand how can I convert the complete html into a json objet?
推荐答案
jsonD = json.dumps(htmlContent.text)
将原始HTML内容转换为JSON字符串表示形式.
jsonL = json.loads(jsonD)
将JSON字符串解析回常规字符串/unicode对象.这导致无操作,因为dumps()
完成的所有转义都将由loads()
还原. jsonL
包含与htmlContent.text
相同的数据.
jsonD = json.dumps(htmlContent.text)
converts the raw HTML content into a JSON string representation.
jsonL = json.loads(jsonD)
parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps()
is reverted by loads()
. jsonL
contains the same data as htmlContent.text
.
尝试使用json.dumps
生成最终的JSON,而不是手动构建JSON:
Try to use json.dumps
to generate your final JSON instead of building the JSON by hand:
ContentUrl = json.dumps({
'url': str(urls),
'uid': str(uniqueID),
'page_content': htmlContent.text,
'date': finalDate
})
这篇关于将html源代码转换为json对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!