尝试使用Java解析HTML目录列表中的链接 [英] Trying to parse links in an HTML directory listing using Java

查看：133 发布时间：2018/12/10 10:45:20 java html parsing href

本文介绍了尝试使用Java解析HTML目录列表中的链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请有人帮我解析HTML页面中的这些链接

Please can someone help me parse these links from an HTML page

http：//nemertes.lis.upatras。 gr / dspace / handle / 123456789/2299

http://nemertes.lis.upatras.gr/dspace/handle/123456789/3154

http://nemertes.lis.upatras.gr/dspace/handle/123456789/3158

http://nemertes.lis.upatras.gr/dspace/handle/123456789/2299
http://nemertes.lis.upatras.gr/dspace/handle/123456789/3154
http://nemertes.lis.upatras.gr/dspace/handle/123456789/3158

我想使用这些链接中常见的句柄字来解析。

I want to parse using the "handle" word which is common in these links.

我正在使用命令 [Pattern pattern = Pattern.compile（< a。+ href = \（。+？）\）;] 但是它解析了页面的所有 href 链接。

I'm using the command [Pattern pattern = Pattern.compile("<a.+href=\"(.+?)\"");] but it parse me all the href links of the page.

有任何建议吗？

谢谢

Any suggestions?
Thanks

推荐答案

您的正则表达式正在查看所有< a href ... 标记。 handle总是用作/ dspace / handle等，所以你可以使用这样的东西来搜索你正在寻找的网址：

Your regular expression is looking at ALL <a href... tags. "handle" is always used as "/dspace/handle" etc. so you can use something like this to scrape the urls you're looking for:

Pattern pattern = Pattern.compile("<a.+href=\"(/dspace/handle/.+?)\"");

这篇关于尝试使用Java解析HTML目录列表中的链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

尝试使用Java解析HTML目录列表中的链接 [英] Trying to parse links in an HTML directory listing using Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

尝试使用Java解析HTML目录列表中的链接 [英] Trying to parse links in an HTML directory listing using Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭