用美丽的汤解析 JS [英] Parsing JS with Beautiful soup

查看:31
本文介绍了用美丽的汤解析 JS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用漂亮的汤解析了一些页面.但是我有 js 代码:

I have some page parsed with beautiful soup. But there I have js code :

<script type="text/javascript">   


var utag_data = {
            customer_id   : "_PHL2883198554", 
            customer_type : "New",
            loyalty_id : "N",
            declined_loyalty_interstitial : "false",
            site_version  : "Desktop Site",
            site_currency: "de_DE_EURO",
            site_region: "uk",
            site_language: "en-GB",


            customer_address_zip : "",
            customer_email_hash :  "",
            referral_source :  "",
            page_type : "product",
            product_category_name : ["Lingerie"],
            product_category_id :[jQuery("meta[name=defaultParent]").attr("content")],
            product_id : ["5741462261401"],
            product_image_url : ["http://images.urbanoutfitters.com/is/image/UrbanOutfitters/5741462261401_001_b?$detailmain$"],
            product_brand : ["Pretty Polly"],
            product_selling_price : ["20.0"],
            promo_id : "6",
            product_referral : ["WOMENS-SHAPEWEAR-LINGERIE-SOLUTIONS-EU"],
            product_name : ["Pretty Polly Shape It Up Tummy Shaping Camisole"],
            is_online_only : true,
            is_back_in_stock : false
}
</script>

我怎样才能从这个输入中得到一些值?我应该像处理文本一样处理这个例子吗?我的意思是将它写入某个变量并拆分然后获取一些数据?

How can I get some values from this input? Should I work with this example like with text? I mean write it to some variable and split and then take some data?

谢谢

推荐答案

一旦您通过

js_text = soup.find('script', type="text/javascript").text

例如.然后你可以使用正则表达式来查找数据,我相信有一种更简单的方法可以做到这一点,但正则表达式也不应该很难.

for example. Then you can use regex to find the data, I'm sure there is an easier way to do this but regex shouldn't be hard as well.

import re
regex =  re.compile('
^(.*?):(.*?)$|,', re.MULTILINE) #compile regex
js_text = re.findall(regex, js_text) #  find first item @ new line to : and 2nd item @ from : to the end of the line or , 
js_text = [jt.strip() for jt in js_text] #  to strip away all of the extra white space.

这将按 name|value|name2|value2... 顺序返回名称和值的列表,您可以稍后处理或转换为字典.

this will return a list of names and values in name|value|name2|value2... order which you can mess around with or convert to dictionary later on.

这篇关于用美丽的汤解析 JS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆