获取附加信息 <div>使用 PHP 网页抓取 [英] Getting additional information <div> using PHP web-scraping

查看:22
本文介绍了获取附加信息 <div>使用 PHP 网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站上抓取一些数据.我对此比较陌生,所以我愿意接受任何建议.我查看了几个 stackoverflow 帖子,但找不到类似的问题/解决方案.

首先,我使用 DOM 查找页面中的所有 div(这里以 https://stackoverflow.com/ 为例).然后我可以轻松获取包含在class="或id="中的任何信息.但是,此页面使用了一些包含链接的额外的、非标准的标签.我想抓取此链接信息.例如:

理想情况下,我会从附加链接中获取所有信息.

到目前为止我的代码是行不通的:

find('div') 作为 $element)$element->find('附加链接');回声$元素;?>

提前致谢.

解决方案

如果我理解您的问题,您可以按照以下方法抓取 additional-link 的值.我展示了如何解析单个元素.鉴于您始终可以创建一个循环来获取所有内容.

find('[class="made-up-class"]',0);echo $item->getAttribute("附加链接");?>

I am trying to scrape some data from a website. I am relatively new to this so I am open to any suggestions. I have looked at several stackoverflow posts but can't find a similar problem/solution.

First, I use DOM to find all the div's in the page (here https://stackoverflow.com/ given as an example). Then I can easily get any information contained in 'class=' or 'id='. However, this page uses some additional, non-standard tags containing links. I would like to scrape this link information. For example:

<div class="made-up-class" additional-link="https://www.google.com/">

Ideally I would get all the information from the additional link.

My code so far is, which doesn't work:

<?php
require 'simple_html_dom.php';

$html = file_get_html('https://stackoverflow.com/');

foreach($html->find('div') as $element)
        $element->find('additional-link');
                echo $element;
?>

Thanks in advance.

解决方案

If I understood your question, you can scrape the value of additional-link complying the following approach. I showed how you can parse a single element. Given that you can always create a loop to get them all.

<?php
    require('simple_html_dom.php');
    $html = "https://stackoverflow.com/";

    $htmldoc = file_get_html($html);
    $item = $htmldoc->find('[class="made-up-class"]',0);
    echo $item->getAttribute("additional-link");
?>

这篇关于获取附加信息 &lt;div&gt;使用 PHP 网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆