URL路径相似度/字符串相似度算法 [英] URL path similarity/string similarity algorithm

查看：565 发布时间：2020/6/3 20:51:54 algorithm data-mining classification levenshtein-distance text-mining

本文介绍了URL路径相似度/字符串相似度算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是，我需要比较URL路径并推断出它们是否相似。下面我提供了要处理的示例数据：

My problem is that I need to compare URL paths and deduce if they are similar. Below I provide example data to process:

# GROUP 1
/robots.txt

# GROUP 2
/bot.html

# GROUP 3
/phpMyAdmin-2.5.6-rc1/scripts/setup.php
/phpMyAdmin-2.5.6-rc2/scripts/setup.php
/phpMyAdmin-2.5.6/scripts/setup.php
/phpMyAdmin-2.5.7-pl1/scripts/setup.php
/phpMyAdmin-2.5.7/scripts/setup.php
/phpMyAdmin-2.6.0-alpha/scripts/setup.php
/phpMyAdmin-2.6.0-alpha2/scripts/setup.php

# GROUP 4
//phpMyAdmin/

我尝试用Levenshtein距离进行比较，但是对我来说还不够准确。我不需要100％准确的算法，但是我认为必须达到90％以上。

I tried Levenshtein distance to compare, but for me is not enough accurate. I do not need 100% accurate algorithm, but I think 90% and above is a must.

我认为我需要某种分类器，但问题是新数据的每个部分都可以包含应该分类为新的未知类的路径。

I think that I need some sort of classifier, but the problem is that each portion of new data can containt path that should be classified to the new unknown class.

您能否将我定向到正确的位置？

Could you please direct me to the right thoutht?

谢谢

URL路径相似度/字符串相似度算法 [英] URL path similarity/string similarity algorithm

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

URL路径相似度/字符串相似度算法 [英] URL path similarity/string similarity algorithm

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭