如何使用PHP DOM从网页中提取关键字 [英] How do I extract keyword from webpage using PHP DOM

查看:193
本文介绍了如何使用PHP DOM从网页中提取关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是从网页提取的代码相同...

Here is a same of code I have extracted from a webpage...

        <div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Profession
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    administration
                </span>
            </div>
        </div>

当显示在网页上时显示为职业管理。我想做的是提取专业,在这种情况下管理。但是,它并不像看起来那样简单,因为这段代码对于各种其他问题重复多次,例如

When displayed on the webpage it shows as "Profession administration". What I want to do is extract the profession, in this case "administration". However, it's not as simple as it might seem because this piece of code is repeated many times for various other questions, such as

        <div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Industry
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    banking
                </span>
            </div>
        </div>

对一个好的解决方案有什么想法吗?

Any ideas on a good solution?

推荐答案

请不要使用正则表达式从页面获取节点值。

Please, do not use regular expressions for getting node values from a page.

PHP有一个非常漂亮的类名为 DOMDocument 。您只需以DOMDocument的形式获取网页:

PHP have a very nice class named DOMDocument. You can just fetch a page as DOMDocument:

$dom = new DOMDocument;
$dom->loadURL("http://test.de/page.html");
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(@class, 'size15')]");
echo $spaner->item(0)->nodeValue . "/" . $spaner->item(1)->nodeValue;

这篇关于如何使用PHP DOM从网页中提取关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆