如何获取在Python中的JavaScript内容 [英] how to fetch javascript contents in python
问题描述
我有一个网站,有数据我想获取存储在一个javascript。我该如何获取呢?
I have a website that has data I want to fetch stored in a javascript. How do I fetch it?
在code是这样的: - http://pastebin.com/zhdWT5HM
The code is this :- http://pastebin.com/zhdWT5HM
我想从VAR playersData行去取。我想取这事 - playerId:showsPlayer(不含引号明显)。我怎么做呢?
I want to fetch from "var playersData" line. I want to fetch this thing :- "playerId":"showsPlayer" (without quotes obviously). How do I do so?
我试过美丽的汤。我现在的剧本是这样的
I've tried beautiful soup. My current script looks like this
q = requests.get('websitelink')
soup = BeautifulSoup(q.text)
searching = soup.findAll('script',{'type':'text/javascript'})
for playerIdin searching:
x = playerId.find_all('var playersData', limit=1)
print x
我得到[]作为我的输出。我似乎无法在这里找出我的问题。
请大家帮帮忙家伙和女生:)
I'm getting [] as my output. I can't seem to figure out my problem here. Please help out guys and gals :)
推荐答案
BeautifulSoup
只会帮助查找所需的剧本
标记。然后,你将有多种选择:你可以利用JavaScript语法分析器中提取所需的数据,如 SLIMIT
,或使用常规的前pressions:
BeautifulSoup
would only help locating the desired script
tag. Then, you would have multiple options: you can extract the desired data with a javascript parser, like slimit
, or use regular expressions:
import re
from bs4 import BeautifulSoup
page = """
<script type="text/javascript">
var logged = true;
var video_id = 59374;
var item_type = 'official';
var debug = false;
var baseUrl = 'http://www.example.com';
var base_url = 'http://www.example.com/';
var assetsBaseUrl = 'http://www.example.com/assets';
var apiBaseUrl = 'http://www.example.com/common';
var playersData = [{"playerId":"showsPlayer","userId":true,"solution":"flash","playlist":[{"itemId":"5090","itemAK":"Movie"}]];
</script><script type="text/javascript" >
"""
soup = BeautifulSoup(page)
pattern = re.compile(r'"playerId":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
打印:
showsPlayer
这篇关于如何获取在Python中的JavaScript内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!