用domDocument类计数单词 [英] counting words with domDocument class
问题描述
如何使用domDocument对html页面中的单词进行计数?
How can i counting the words in a html page, with domDocument?
例如,如果输入内容如下:
for example, if the input is something like:
<div> Hello something open. <a href="open.php">click</a>
lorem ipsum <a href="open.php">here></a>
输出:
数字字
1您好
2事物
3打开
4单击
5 lorem
6 ipsum
7在这里。
the output:
Number Word
1 Hello
2 something
3 open
4 click
5 lorem
6 ipsum
7 here.
如果我只需要链接文本怎么办?
单击4
在这里7
And what if i need only the linktext?
click 4
here 7
推荐答案
如果整个文档都需要这样做,只需 strip_tags
,然后运行 str_word_count
结果。
If you need this for the entire document, it is likely easier to just strip_tags
and then run str_word_count
on the result.
如果必须使用DOM,可以这样做
If you have to do this with DOM, you can do
$str = <<< HTML
<div> Hello something open. <a href="open.php">click</a>
lorem ipsum <a href="open.php">here></a></div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//text()');
$textNodeContent = '';
foreach($nodes as $node) {
$textNodeContent .= " $node->nodeValue";
}
print_r(str_word_count( $textNodeContent, 1 ));
使用 text()
作为 XPath表达式只会为您提供文档中的textnode。您可以将其限制为仅返回表达式的 // a / text()
链接文本。
Using text()
as the XPath expression will only give you the textnodes in the document. You can limit this to just return the link texts with //a/text()
for the expression.
这篇关于用domDocument类计数单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!