我可以用什么算法来识别网页上的内容 [英] What algorithms could I use to identify content on a web page

查看：293 发布时间：2015/11/30 16:12:26 algorithm webpage html-content-extraction

本文介绍了我可以用什么算法来识别网页上的内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我装了在浏览器中（即它的DOM和元素的定位都可以访问到我）一个网页，我想找到块元素（或这些元素的排序列表），其中可能包含了大多数内容（如在文本中的连续块）。我们的目标是要排除的东西，如菜单，页眉，页脚以及这样

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.

我可以用什么算法来识别网页上的内容 [英] What algorithms could I use to identify content on a web page

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

我可以用什么算法来识别网页上的内容 [英] What algorithms could I use to identify content on a web page

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭