如何用Web :: Scraper解析这个HTML？ [英] How to Parse this HTML with Web::Scraper?

查看：167 发布时间：2017/6/25 5:11:40 html perl dom web-scraping scraper

本文介绍了如何用Web :: Scraper解析这个HTML？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Web :: Scraper 来解析以下HTML：

I am trying to use Web::Scraper to parse the following HTML:

<div>
<p><strong>TITLE1</strong>
<br>
DESCRIPTION1
</p>
<p><strong>TITLE2</strong>
<br>
DESCRIPTION2
</p>
<p><strong>TITLE3</strong>
<br>
DESCRIPTION3
</p>
</div>

into

      'test' => [
                  {
                    'name' => 'TITLE1',
                    'desc' => 'DESCRIPTION1 '
                  },
                  {
                    'name' => 'TITLE2',
                    'desc' => 'DESCRIPTION2 '
                  },
                  {
                    'name' => 'TITLE3',
                    'desc' => 'DESCRIPTION3 '
                  }
                ]

我有以下代码，没有多少运气。处理'p'时的'TEXT'给出了文本和strong之间的内容，例如

I have the following code but I don't have much luck. 'TEXT' when processing 'p' gives both the text and what is between "strong" for example

      'test' => [
                  {
                    'name' => 'TITLE1',
                    'desc' => 'TITLE1 DESCRIPTION1 '
                  }
                ]

加上它只有第一个项目。

plus its only the first item.

这是我的代码。

use strict;
use Web::Scraper;
use Data::Dumper;

my $html = q[<div>
            <p><strong>TITLE1</strong>
            <br>
            DESCRIPTION1
            </p>
            <p><strong>TITLE2</strong>
            <br>
            DESCRIPTION2
            </p>
            <p><strong>TITLE3</strong>
            <br>
            DESCRIPTION3
            </p>
           </div>
           ];

 my $test = scraper {
 process 'div', 'test[]' => scraper {
    process 'p strong', 'name' => 'TEXT';
    process 'p','desc' => 'TEXT';       
   };
 };

  my $res = $test->scrape(\$html);
  print Dumper($res);

谢谢。

推荐答案

您的代码中有两点需要更改。

There are two points in your code that need changing.

要仅获取描述 -text，请使用xpath 。 // p / text（）将直接在任何 p 下给出文本节点，因此不包括 strong 。

To get only the DESCRIPTION-text, use xpath. //p/text() will give you the text-nodes directly under any p, so the ones inside of the strong are not included.

要使所有块 p 显示在数组中，而不仅仅是第一个指令在 div p 上。这样就可以把 div 里的所有 p ，而不仅仅是一个 div 。


To make all blocks of p show up in the array, and not only the first one, make the first instruction be on div p. That way it grabs all p inside of a div and not only the one div.
my $test = scraper {
    process 'div p', 'test[]' => scraper {
        process 'p strong',           'name' => 'TEXT';
        process '//p/text()', 'desc' => ['TEXT', sub { s/^\s+|\s+$//g } ];
    };
};

输出（使用 Data :: Printer ）：
\ {
    test   [
        [0] {
            desc   "DESCRIPTION1",
            name   "TITLE1"
        },
        [1] {
            desc   "DESCRIPTION2",
            name   "TITLE2"
        },
        [2] {
            desc   "DESCRIPTION3",
            name   "TITLE3"
        }
    ]
}


                        这篇关于如何用Web :: Scraper解析这个HTML？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何用Web :: Scraper解析这个HTML？ [英] How to Parse this HTML with Web::Scraper?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何用Web :: Scraper解析这个HTML？ [英] How to Parse this HTML with Web::Scraper?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭