如何使用 python 解析 Javascript 变量? [英] How can I parse Javascript variables using python?

查看:20
本文介绍了如何使用 python 解析 Javascript 变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我试图从使用 Javascript 收集数据的网站生成图表.我希望能够提取图表中正在使用的数据,但我不确定从哪里开始.例如,数据可能如下所示:

The problem: A website I am trying to gather data from uses Javascript to produce a graph. I'd like to be able to pull the data that is being used in the graph, but I am not sure where to start. For example, the data might be as follows:

var line1=
[["Wed, 12 Jun 2013 01:00:00 +0000",22.4916114807,"2 sold"],
["Fri, 14 Jun 2013 01:00:00 +0000",27.4950008392,"2 sold"],
["Sun, 16 Jun 2013 01:00:00 +0000",19.5499992371,"1 sold"],
["Tue, 18 Jun 2013 01:00:00 +0000",17.25,"1 sold"],
["Sun, 23 Jun 2013 01:00:00 +0000",15.5420341492,"2 sold"],
["Thu, 27 Jun 2013 01:00:00 +0000",8.79045295715,"3 sold"],
["Fri, 28 Jun 2013 01:00:00 +0000",10,"1 sold"]];

这是定价数据(日期、价格、数量).我在这里发现了另一个问题 - 解析变量数据使用 python 的 js 标签 - 这表明我使用 JSON 和 BeautifulSoup,但我不确定如何将它应用于这个特定问题,因为格式略有不同.事实上,在这个问题中,代码看起来更像 python,而不是任何类型的 JSON 字典格式.

This is pricing data (Date, Price, Volume). I've found another question here - Parsing variable data out of a js tag using python - which suggests that I use JSON and BeautifulSoup, but I am unsure how to apply it to this particular problem because the formatting is slightly different. In fact, in this problem the code looks more like python than any type of JSON dictionary format.

我想我可以将它作为字符串读入,然后使用 XPATH 和一些时髦的字符串编辑来转换它,但这对于已经格式化为 Javascript 变量的东西来说似乎太多了.

I suppose I could read it in as a string, and then use XPATH and some funky string editing to convert it, but this seems like too much work for something that is already formatted as a Javascript variable.

那么,在使用 python 时,我可以在这里做些什么来从这个变量中提取这种类型的有组织的数据?(我最熟悉python和BS4)

So, what can I do here to pull this type of organized data from this variable while using python? (I am most familiar with python and BS4)

推荐答案

好的,所以有几种方法可以做到,但我最终只是使用正则表达式来查找 line1=;

Okay, so there are a few ways to do it, but I ended up simply using a regular expression to find everything between line1= and ;

#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#evaluate list as python code => turn into list in python
newParsed = eval(parsed[0])

当您有良好的编码时,正则表达式很好,但是这种方法是否比这里的任何其他答案更好(或更糟!)?

Regex is nice when you have good coding, but is this method better ( or worse!) than any of the other answers here?

我最终使用了以下内容:

I ultimately used the following:

#Read page data as a string
pageData = sock.read()
#set p as regular expression
p = re.compile('(?<=line1=)(.*)(?=;)')
#find all instances of regular expression in pageData
parsed = p.findall(pageData)
#load as JSON instead of using evaluate to prevent risky execution of unknown code
newParsed = json.loads(parsed[0])

这篇关于如何使用 python 解析 Javascript 变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆