将html数据解析为php中的数组数据 [英] parse the html data to array data in php

查看:68
本文介绍了将html数据解析为php中的数组数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用a标签类将html格式的数据解析为数组,但是我无法获得所需的格式.以下是我的数据

I am trying to parse the html format data into arrays using the a tag classes but i was not able to get the desired format . Below is my data

$text ='<div class="result results_links results_links_deep web-result ">
  <div class="links_main links_deep result__body">
    <h2 class="result__title">
      <a rel="nofollow" class="result__a" href="">Text1</a> 
    </h2>
    <a class="result__snippet" href="">Text1</a> 
    <a class="result__url" href="">
    example.com
    </a>
  </div>
</div>

<div class="result results_links results_links_deep web-result ">
  <div class="links_main links_deep result__body">
    <h2 class="result__title">
      <a rel="nofollow" class="result__a" href="">text3</a> 
    </h2>
    <a class="result__snippet" href="">text23</a> 
    <a class="result__url" href="">
    text.com
    </a>
  </div>
</div>';

我正在尝试使用以下代码获取结果

I am trying to get the result using below code

$lines = explode("\n", $text);
$out = array();
foreach ($lines as $line) {
    $parts = explode(" > ", $line);
    $ref = &$out;
    while (count($parts) > 0) {
        if (isset($ref[$parts[0]]) === false) {
            $ref[$parts[0]] = array();
        }
        $ref = &$ref[$parts[0]];
        array_shift($parts);
    }
}
print_r($out);

但是我需要与下面完全一样的结果

But i need the result exactly like below

array:2 [
  0 => array:3 [
    0 => "Text1"
    1 => "Text1"
    2 => "example.com"
  ]
  1 => array:3 [
    0 => "text3"
    1 => "text23"
    2 => "text.com"
  ]
]

演示: https://eval.in/746170

即使我在laravel中尝试如下所示的dom:

Even i was trying dom like below in laravel :

$dom = new DOMDocument;
$dom->loadHTML($text);
foreach($dom->getElementsByTagName('a') as $node)
{
    $array[] = $dom->saveHTML($node);
}

print_r($array);

所以我该如何使用这些类来分隔所需的数据.请提出任何建议.谢谢.

So how can i use the classes to separate the data as i wanted .Any suggestions please.Thank you .

推荐答案

我将使用DOMDocumentDOMXPath来更轻松地定位有趣的部分.为了更精确,我注册了一个检查类属性是否包含一组类的函数:

I will do it using DOMDocument and DOMXPath to target interesting parts more easily. In order to be more precise, I register a function that checks if a class attribute contains a set of classes:

function hasClasses($attrValue, $requiredClasses) {
    $requiredClasses = explode(' ', $requiredClasses);
    $classes = preg_split('~\s+~', $attrValue, -1, PREG_SPLIT_NO_EMPTY);
    return array_diff($requiredClasses, $classes) ? false : true;
}

$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);

$xp = new DOMXPath($dom);
$xp->registerNamespace('php', 'http://php.net/xpath');
$xp->registerPhpFunctions('hasClasses');

$mainDivClasses = 'result results_links results_links_deep web-result';
$childDivClasses = 'links_main links_deep result__body';

$divNodeList = $xp->query('//div[php:functionString("hasClasses", @class, "' . $mainDivClasses . '")]
                           /div[php:functionString("hasClasses", @class, "' . $childDivClasses . '")]');

$results = [];
foreach ($divNodeList as $divNode) {
    $results[] = [
        trim($xp->evaluate('string(./h2/a[@class="result__a"])', $divNode)),
        trim($xp->evaluate('string(.//a[@class="result__snippet"])', $divNode)),
        trim($xp->evaluate('string(.//a[@class="result__url"])', $divNode))
    ];
}

print_r($results);


无需注册函数,也可以在谓词中使用XPath函数contains.它不太精确,因为它仅检查子字符串是否在较大的字符串中(而不是类属性是否具有特定类,例如hasClasses函数),但是它必须足够:


without registering a function, you can also use the XPath function contains in your predicates. It's less precise since it only checks if a substring is in a larger string (and not if a class attribute have a specific class like the hasClasses function) but it must be enough:

$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);

$xp = new DOMXPath($dom);

$divNodeList = $xp->query('//div[contains(@class, "results_links_deep")]
                                [contains(@class, "web-result")]
                           /div[contains(@class, "links_main")]
                               [contains(@class, "links_deep")]
                               [contains(@class, "result__body")]');

$results = [];
foreach ($divNodeList as $divNode) {
    $results[] = [
        trim($xp->evaluate('string(./h2/a[@class="result__a"])', $divNode)),
        trim($xp->evaluate('string(.//a[@class="result__snippet"])', $divNode)),
        trim($xp->evaluate('string(.//a[@class="result__url"])', $divNode))
    ];
}

print_r($results);

这篇关于将html数据解析为php中的数组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆