从路径链接中排除网址? [英] excluding URLs from path links?

查看:62
本文介绍了从路径链接中排除网址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的功能中,我想指定要从结果中排除的域列表。有哪些选择?

In the function below, I'd like to specify a list of domains to exclude from the results. What are some options? Array collection to exclude?

class KeywordSearch
{       
    const GOOGLE_SEARCH_XPATH = "//a[@class='l']";
    public $searchQuery;
    public $numResults ;
    public $sites;
    public $finalPlainText = '';
    public $finalWordList = array();
    public $finalKeywordList = array();

    function __construct($query,$numres=7){
        $this->searchQuery = $query;
        $this->numResults = $numres;
        $this->sites = array();
    }

    protected static $_excludeUrls  = array('wikipedia.com','amazon.com','youtube.com','zappos.com');//JSB NEW

    private function getResults($searchHtml){

        $results = array();
        $dom = new DOMDocument();
        $dom->preserveWhiteSpace = false;
        $dom->formatOutput = false;
        @$dom->loadHTML($searchHtml);
        $xpath = new DOMXpath($dom);
        $links = $xpath->query(self::GOOGLE_SEARCH_XPATH);

        foreach($links as $link)
        {
            $results[] = $link->getAttribute('href');           
        }

        $results = array_filter($results,'self::kwFilter');//JSB NEW
        return $results;
    }

    protected static function kwFilter($value)
    {
        return !in_array($value,self::$_excludeUrls);
    }   


推荐答案

protected static $_banUrls  = array('foo.com','bar.com');

private function getResults($searchHtml){

        $results = array();

        $dom = new DOMDocument();

        $dom->preserveWhiteSpace = false;

        $dom->formatOutput = false;

        @$dom->loadHTML($searchHtml);

        $xpath = new DOMXpath($dom);

        $links = $xpath->query(self::GOOGLE_SEARCH_XPATH);


        foreach($links as $link)
        {
        //FILTER OUT SPECIFIC LINKS HERE
            $results[] = $link->getAttribute('href');

        }
        $results = array_filter($results,'self::myFilter');

        return $results;

    }

    protected static function myFilter($value)
    {
            return !in_array($value,self::$_banUrls);
    }

这篇关于从路径链接中排除网址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆