用漂亮的汤解析JS [英] Parsing JS with Beautiful soup

查看:91
本文介绍了用漂亮的汤解析JS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些页面被漂亮的汤解析.但是我有js代码:

I have some page parsed with beautiful soup. But there I have js code :

<script type="text/javascript">   


var utag_data = {
            customer_id   : "_PHL2883198554", 
            customer_type : "New",
            loyalty_id : "N",
            declined_loyalty_interstitial : "false",
            site_version  : "Desktop Site",
            site_currency: "de_DE_EURO",
            site_region: "uk",
            site_language: "en-GB",


            customer_address_zip : "",
            customer_email_hash :  "",
            referral_source :  "",
            page_type : "product",
            product_category_name : ["Lingerie"],
            product_category_id :[jQuery("meta[name=defaultParent]").attr("content")],
            product_id : ["5741462261401"],
            product_image_url : ["http://images.urbanoutfitters.com/is/image/UrbanOutfitters/5741462261401_001_b?$detailmain$"],
            product_brand : ["Pretty Polly"],
            product_selling_price : ["20.0"],
            promo_id : "6",
            product_referral : ["WOMENS-SHAPEWEAR-LINGERIE-SOLUTIONS-EU"],
            product_name : ["Pretty Polly Shape It Up Tummy Shaping Camisole"],
            is_online_only : true,
            is_back_in_stock : false
}
</script>

如何从此输入中获取一些值? 我是否应该像处理文本一样使用此示例?我的意思是将其写入一些变量并拆分,然后获取一些数据?

How can I get some values from this input? Should I work with this example like with text? I mean write it to some variable and split and then take some data?

谢谢

推荐答案

一旦您具有脚本文字,

js_text = soup.find('script', type="text/javascript").text

例如.然后,您可以使用正则表达式来查找数据,我敢肯定有一种更简单的方法可以做到这一点,但是正则表达式也不应该很困难.

for example. Then you can use regex to find the data, I'm sure there is an easier way to do this but regex shouldn't be hard as well.

import re
regex =  re.compile('\n^(.*?):(.*?)$|,', re.MULTILINE) #compile regex
js_text = re.findall(regex, js_text) #  find first item @ new line to : and 2nd item @ from : to the end of the line or , 
js_text = [jt.strip() for jt in js_text] #  to strip away all of the extra white space.

这将以name | value | name2 | value2 ...的顺序返回名称和值的列表,您稍后可以将其弄乱或转换为字典.

this will return a list of names and values in name|value|name2|value2... order which you can mess around with or convert to dictionary later on.

这篇关于用漂亮的汤解析JS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆