如何以json格式从javascript中提取数据? [英] How to extract data from javascript in a json format?

查看:52
本文介绍了如何以json格式从javascript中提取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难提取数据首先我需要提取标题帖子和帖子的发布日期这是网址.

I am getting a hardtime extracting the data First I need to extract the title post and the posted date of the post here's the url.

网址:https://cheddar.com/media/safety-concerns-over-teslas-autopilot-from-consumer-reports-as-wall-street-turns-bearish

在 view-source 中有一个 json 格式的脚本,其中包含我需要的数据

Inside view-source there's a script in a json format that contains the data that I needed

像这样,我裁剪其他文本以最小化空间

Something like this, I crop the other text to minimize the space

<script>
      window.__RELAY_STORE__ = {"public_at":"2019-05-22T11:02:43- 
04:00","updated_at":"2019-05-22T15:25:20- 
04:00","thumbnail_attribution":null,"body":null,"title":"Safety Concerns 
Over Tesla's Autopilot from Consumer Reports as Wall Street Turns Bearish"
</script>

我只需要得到public_at"和title"

I just only need to get the "public_at" and the "title"

我尝试过的是这个,

data = response.xpath("//script[contains(., 'window.__RELAY_STORE__')]/text()")
#Locate the script

datatxt = data.extract_first()
#Extract the script

start = datatxt.find('client:') - 2
end = datatxt.find('window.__REDUX_STATE__')
# find start and end of data 

json_string = datatxt[start:end]

但是当我加载它或将其转换为 python 字典时

but when I load it or convert it to python dictionary

 data = json.loads(json_string)

我有一个类似这样的错误

I've got an error something like this

Extra data: line 1 column 27284 (char 27283)

知道如何获取这些数据吗?

Any idea how can I get those data please?

推荐答案

尝试以这种方式获取数据:

Try to get data in this way:

txt = response.xpath("//script[contains(., 'window.__RELAY_STORE__')]/text()").re_first('window.__RELAY_STORE__ = (.*);')

这将裁剪 js 变量的名称和最后一个 ;.所以当我调用 json.loads(txt) 时,它给了我有效的 json.

This will crop name of js-variable and last ;. So then when I call json.loads(txt) it gives me valid json.

这篇关于如何以json格式从javascript中提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆