如何从HTML页面中提取文本块？ [英] How to extract blocks of text from a HTML page?

查看：309 发布时间：2018/6/25 18:34:34 php html html-content-extraction

本文介绍了如何从HTML页面中提取文本块？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用PHP从大型HTML页面中提取超过100个字的文本块。文本是否包含在< p> ...< / p> 中并不重要。我只关心构成连贯文本块的单词数量，因此HTML段落以外的文本也应该被考虑。

I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the text is contained in <p>...</p> doesn't matter. I only care about the number of words that makes a coherent text block so texts outside of HTML paragraphs should also be taken into consideration.

这怎么做？

推荐答案

我使用phpQuery。你熟悉jQuery吗？他们共享相同的语法。你可能会担心安装一个新的库，但相信我这个库是值得的额外头顶

I use phpQuery. Are you familiar with jQuery? they share the same syntax. You might be concerned about installing a new library, but trust me this library is well worth the extra over head

然后你可以像这样访问它：

You can then access it like this:

foreach($doc->find('p') as $element){
   $element = pq($element);
   echo str_word_count($element->text());
}

这篇关于如何从HTML页面中提取文本块？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从HTML页面中提取文本块？ [英] How to extract blocks of text from a HTML page?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何从HTML页面中提取文本块？ [英] How to extract blocks of text from a HTML page?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭