如何找出HTML文档中关键字的位置? [英] How to figure out the location of a keyword in an HTML document?

查看:118
本文介绍了如何找出HTML文档中关键字的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HTML文档作为字符串



我想在本文档中搜索关键字并找出它在文档中出现的位置



我的意思是它出现的标签



是否出现在H1,H2或TITLE标签中



可以说我的文档是

  $ string =< html> 
< head>
< title>栏,这是一个例子< / title>
< / head>
< body>
< h1> / h1>
foo< strong> bar< / strong>
< / body>
< / html>;


$ arr = find_term(bar,$ string);
print_r($ arr);

我预计结果会像这样

  [0] => title 
[1] =>强

因为bar在TITLE标记中出现过一次,在STRONG标记中出现过一次



我知道这是一个复杂的问题,这就是为什么我问是否有人知道答案:)

谢谢



到目前为止,我的目标是

$ p $ 函数find_term($ term, $ string){
$ arr = explode($ term,$ string);
返回$ arr;
}
$ arr = find_term(bar,$ string);
print_r($ arr);

现在我们有一个数组,其值为

 数组

[0] =>< html>
< head>
< title>

[1] =>,这是一个示例< / title>
< / head>
< body>
< h1> h1>
foo< strong>

[2] =>< / strong>
< / body>
< / html>





$ b

你可以看到数组中每个元素的最后一个标签是包含bar
,但现在的问题是如何知道每个元素中最后一个标签的出现?

谢谢

解决方案

您可以使用 DOMDocument xpath

 <?php 
$ doc = new DOMDocument;
$ doc-> loadhtml('< html>
< head>
< title>栏,这是一个例子< / title>
< / head> ;
< body>
< h1>最新消息< / h1>
foo< strong> bar< / strong>
< i> foobar< / i>
< / body>
< / html>');
$ xpath = new DOMXPath($ doc);
foreach($ xpath-> query('// * [contains(child :: text(),bar)]')as $ e){
echo $ e-> tagName ,\\\
;

$ / code>

打印

  title 
strong
i

请注意I-元件。它包含foobar,而不是单个词并且匹配xpath查询。所以这个解决方案可能会或可能不足够。

I have an HTML document as a string

I want to search for a keyword in this document and figure out where did it appear in the document

I mean in which tag did it appear

did it appear in H1,H2 or TITLE tag

lets say my document is

        $string = "<html>
                   <head> 
                   <title>bar , this is an example</title> 
                   </head> 
                   <body> 
                   <h1>latest news</h1>
                   foo <strong>bar</strong> 
                   </body>
                   </html>";


                   $arr = find_term("bar",$string);
                   print_r($arr);

I expect the result to be like this

                   [0]=> title
                   [1]=> strong

because "bar" appeared one time in TITLE tag and one time in the STRONG tag

I knew it is a complicated question, that is why I am asking if someone knows the answer :)

thanks

what I have so far is

        function find_term($term,$string){
               $arr = explode($term, $string);
               return $arr;
        }
        $arr = find_term("bar",$string);
        print_r($arr);

now we have an array which has the value

             Array
             (
             [0] => <html>
               <head>
               <title>

             [1] =>  , this is an example</title>
               </head>
               <body>
               <h1>latest news</h1>
               foo <strong>

             [2] => </strong>
               </body>
               </html>
             )

you can see that the last tag of every element of the array is the tag which contains "bar" but the question now is how to know the last tag appeard in every element?

Thanks

解决方案

You can use DOMDocument and xpath for that.

<?php
$doc = new DOMDocument;
$doc->loadhtml('<html>
  <head> 
    <title>bar , this is an example</title> 
  </head> 
  <body> 
    <h1>latest news</h1>
    foo <strong>bar</strong> 
    <i>foobar</i>
   </body>
</html>');
$xpath = new DOMXPath($doc);
foreach($xpath->query('//*[contains(child::text(),"bar")]') as $e) {
  echo $e->tagName, "\n";
}

prints

title
strong
i

Note the i-element. It contains foobar, not bar as a single word and matches the xpath query. So this solution may or may not suffice.

这篇关于如何找出HTML文档中关键字的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆