从< script>中提取多行javascript内容标签使用Scrapy [英] Extract multi-line javascript content from <script> tag using Scrapy

查看：87 发布时间：2019/5/27 13:44:06 javascript python regex scrapy

本文介绍了从< script>中提取多行javascript内容标签使用Scrapy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Scrapy从此脚本标记中提取数据：

I'm trying to extract data from this script tag using Scrapy:

<script>
        var hardwareTemplateFunctions;
        var storefrontContextUrl = '';

        jq(function() {
            var data = new Object();
            data.hardwareProductCode = '9054832';
            data.offeringCode = 'SMART_BASIC.TLF12PLEAS';
            data.defaultTab = '';
            data.categoryId = 10001;

            data.bundles = new Object();
                            data.bundles['SMART_SUPERX.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('1099'),
                    monthlyPrice: parsePrice('499'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Super',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('499'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_PLUSS.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('1599'),
                    monthlyPrice: parsePrice('399'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Pluss',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('399'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_BASIC.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('2199'),
                    monthlyPrice: parsePrice('299'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Basis',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('299'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_MINI.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('2999'),
                    monthlyPrice: parsePrice('199'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Mini',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('199'),
                    commitmentTime: 12
                };
                            data.bundles['KONTANT_KOMPLETT.REGULAR'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('0'),
                    upfrontPrice: parsePrice('3499'),
                    monthlyPrice: parsePrice('0'),
                    commitmentTime: parsePrice('0'),
                    offeringTitle: 'SMART Kontant',
                    offeringType: 'PREPAID',
                    monthlyPrice: parsePrice('0'),
                    commitmentTime: 0
                };

            data.reviewJson = new Object();


            hardwareTemplateFunctions = hardwareTemplateFunctions(data);
            hardwareTemplateFunctions.init();

            data.reviewSummaryBox = hardwareTemplateFunctions.reviewSummaryBox;

            accessoryFunctions(data).init();
            additionalServiceFunctions(data).init();
        });

        function parsePrice(str) {
            var price = parseFloat(str);
            return isNaN(price) ? 0 : price;
        }

        var offerings = {};
    </script>

我想从每个部分获得如下数据：

I wan to get the data from each section that looks like this:

 data.bundles['SMART_SUPERX.TLF12PLEAS'] = {
                signupFee: parsePrice('0'),
                newMsisdnFee: parsePrice('199'),
                upfrontPrice: parsePrice('1099'),
                monthlyPrice: parsePrice('499'),
                commitmentTime: parsePrice('12'),
                offeringTitle: 'SMART Super',
                offeringType: 'VOICE',
                monthlyPrice: parsePrice('499'),
                commitmentTime: 12
            };

然后从每个字段中获取数据并从例如 upfrontPrice （例如本例中为1099）。

and then fetch the data from each field and get the final data from for example upfrontPrice (e.g 1099 in this example).

我尝试使用此方法获取每个对象：

I have tried fetching each object using this:

items = response.xpath('//script/text()').re("data.bundles\[.*\](.*)")

然而，这只给我第一行数据。（ = {）。那我该怎么做呢？有没有更好的方法从脚本标记中提取此数据？

However that only give me the first line of data. (= {). So how should i do this? Is there a better way of extracting this data from the script tag?

编辑：当我使用 items = response.xpath（'// script / text（）'）时。 re（data.bundles\ [。* \] = {（（？s）。*）};）我似乎只得到最后一个块（带有<$ c的块） $ c> data.bundles ['KONTANT_KOMPLETT.REGULAR'] ）

When i use items = response.xpath('//script/text()').re("data.bundles\[.*\] = {((?s).*) };") I seem to get only the last block (the one with data.bundles['KONTANT_KOMPLETT.REGULAR'])

我如何获得所有这些的列表？

How do i get a list of all of them?

从< script>中提取多行javascript内容标签使用Scrapy [英] Extract multi-line javascript content from <script> tag using Scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

从&lt; script&gt;中提取多行javascript内容标签使用Scrapy [英] Extract multi-line javascript content from &lt;script&gt; tag using Scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

从< script>中提取多行javascript内容标签使用Scrapy [英] Extract multi-line javascript content from <script> tag using Scrapy

登录关闭