在许多HTML文档之间找到文本块之间的相似性？ [英] Find Similarities between Blocks of Text between Many HTML Documents?

查看：85 发布时间：2020/10/22 0:32:45 ruby diff

本文介绍了在许多HTML文档之间找到文本块之间的相似性？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我说20个HTML页面，并且想提取文档的共享/相似部分，那么有哪些有效的方法呢？

If I have say 20 HTML pages and I want to extract out the shared/similar portions of the documents, what are some efficient ways to do that?

所以例如，对于StackOverflow，比较10个页面，我发现每个页面的顶部栏和主菜单栏都相同，因此我可以将它们提取出来。

So say for StackOverflow, comparing 10 pages I'd find that the top bar and the main menu bar are the same across each page, so I could extract them out.

它似乎我需要diff程序或一些复杂的正则表达式，但假设我事先对页面/文本/ html结构没有任何了解。

It seems like I'd need either a diff program or some complex regexps, but assume I don't have any knowledge of the page/text/html structure beforehand.

是

在许多HTML文档之间找到文本块之间的相似性？ [英] Find Similarities between Blocks of Text between Many HTML Documents?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在许多HTML文档之间找到文本块之间的相似性？ [英] Find Similarities between Blocks of Text between Many HTML Documents?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭