从 <script> 中提取多行 javascript 内容使用 Scrapy 标记 [英] Extract multi-line javascript content from <script> tag using Scrapy

查看：21 发布时间：2022/1/4 20:59:54 javascript python regex scrapy

本文介绍了从 <script> 中提取多行 javascript 内容使用 Scrapy 标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Scrapy 从这个脚本标签中提取数据:

I'm trying to extract data from this script tag using Scrapy:

<script>
        var hardwareTemplateFunctions;
        var storefrontContextUrl = '';

        jq(function() {
            var data = new Object();
            data.hardwareProductCode = '9054832';
            data.offeringCode = 'SMART_BASIC.TLF12PLEAS';
            data.defaultTab = '';
            data.categoryId = 10001;

            data.bundles = new Object();
                            data.bundles['SMART_SUPERX.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('1099'),
                    monthlyPrice: parsePrice('499'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Super',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('499'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_PLUSS.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('1599'),
                    monthlyPrice: parsePrice('399'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Pluss',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('399'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_BASIC.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('2199'),
                    monthlyPrice: parsePrice('299'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Basis',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('299'),
                    commitmentTime: 12
                };
                            data.bundles['SMART_MINI.TLF12PLEAS'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('199'),
                    upfrontPrice: parsePrice('2999'),
                    monthlyPrice: parsePrice('199'),
                    commitmentTime: parsePrice('12'),
                    offeringTitle: 'SMART Mini',
                    offeringType: 'VOICE',
                    monthlyPrice: parsePrice('199'),
                    commitmentTime: 12
                };
                            data.bundles['KONTANT_KOMPLETT.REGULAR'] = {
                    signupFee: parsePrice('0'),
                    newMsisdnFee: parsePrice('0'),
                    upfrontPrice: parsePrice('3499'),
                    monthlyPrice: parsePrice('0'),
                    commitmentTime: parsePrice('0'),
                    offeringTitle: 'SMART Kontant',
                    offeringType: 'PREPAID',
                    monthlyPrice: parsePrice('0'),
                    commitmentTime: 0
                };

            data.reviewJson = new Object();


            hardwareTemplateFunctions = hardwareTemplateFunctions(data);
            hardwareTemplateFunctions.init();

            data.reviewSummaryBox = hardwareTemplateFunctions.reviewSummaryBox;

            accessoryFunctions(data).init();
            additionalServiceFunctions(data).init();
        });

        function parsePrice(str) {
            var price = parseFloat(str);
            return isNaN(price) ? 0 : price;
        }

        var offerings = {};
    </script>

我想从每个部分获取如下所示的数据:

I wan to get the data from each section that looks like this:

 data.bundles['SMART_SUPERX.TLF12PLEAS'] = {
                signupFee: parsePrice('0'),
                newMsisdnFee: parsePrice('199'),
                upfrontPrice: parsePrice('1099'),
                monthlyPrice: parsePrice('499'),
                commitmentTime: parsePrice('12'),
                offeringTitle: 'SMART Super',
                offeringType: 'VOICE',
                monthlyPrice: parsePrice('499'),
                commitmentTime: 12
            };

然后从每个字段中获取数据并从例如 upfrontPrice(例如本例中的 1099)中获取最终数据.

and then fetch the data from each field and get the final data from for example upfrontPrice (e.g 1099 in this example).

我尝试使用此方法获取每个对象:

I have tried fetching each object using this:

items = response.xpath('//script/text()').re("data.bundles[.*](.*)")

然而，这只给我第一行数据.(= {).那么我该怎么做呢?有没有更好的方法从脚本标签中提取这些数据?

However that only give me the first line of data. (= {). So how should i do this? Is there a better way of extracting this data from the script tag?

当我使用 items = response.xpath('//script/text()').re("data.bundles[.*] = {((?s).*) };") 我似乎只得到最后一个块(带有 data.bundles['KONTANT_KOMPLETT.REGULAR'] 的块)

When i use items = response.xpath('//script/text()').re("data.bundles[.*] = {((?s).*) };") I seem to get only the last block (the one with data.bundles['KONTANT_KOMPLETT.REGULAR'])

我如何获得所有这些的列表?

How do i get a list of all of them?

从 <script> 中提取多行 javascript 内容使用 Scrapy 标记 [英] Extract multi-line javascript content from <script> tag using Scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

从 &lt;script&gt; 中提取多行 javascript 内容使用 Scrapy 标记 [英] Extract multi-line javascript content from &lt;script&gt; tag using Scrapy

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

从 <script> 中提取多行 javascript 内容使用 Scrapy 标记 [英] Extract multi-line javascript content from <script> tag using Scrapy

登录关闭