如何用html文档中的超链接替换原始网址 [英] how to replace plain url with hyperlinks in html doc

查看：141 发布时间：2018/6/20 16:10:59 java html regex hyperlink regex-negation

本文介绍了如何用html文档中的超链接替换原始网址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在HTML文档中替换纯链接到超链接。
和我的逻辑是
$ b $ pre $ private static final Pattern WEB_URL_PROTOCOL = Pattern.compile（（？i）http | https ：//）; StringBuffer sb = new StringBuffer（）; if（text！= null）{ //在文本消息中转义任何无意的HTML text = EmailHtmlUtil.escapeCharacterToDisplay（text）; //查找任何嵌入的URL并链接 Matcher m = Patterns.WEB_URL.matcher（text）; while（m.find（））{ int start = m.start（）; if（start == 0 || text.charAt（start - 1）！='@'）{ String url = m.group（）; Matcher proto = WEB_URL_PROTOCOL.matcher（url）; 字符串链接; if（proto.find（））{ 小写协议链接。 link = proto.group（）。toLowerCase（）+ url.substring（proto.end（））; } else { link =http：//+ url; } String href = String.format（< a href = \％s \>％s< / a>，link，url）; m.appendReplacement（sb，href）; } else { m.appendReplacement（sb，$ 0）; } } m.appendTail（sb）; }

此代码成功找出html文档中的所有链接，但问题是它也可以找到超链接。所以我想排除超链接并且只想找到简单的链接
，例如它应该排除

 < p class =MsoNormal>< a href =awbs：//www.google.comtarget =_BLANK> https：//www.google.com< / a>普通地址https< / p>

但简单链接https://www.google.com - 1. 应该被一个超链接替换 //www.yahoo.comrel =nofollow> https://www.yahoo.com

2. https://www.google.com 正常地址https 所以在这里我想替换 https://www.yahoo.com with < a href =https：//www.yahoo.com> https：//www.yahoo.com< / a> ，它应该不会影响2。解决方案我会建议您在这里使用 jsoup 。示例代码 String text =< html>< head>< / head> ;< body>< a href ='http：//google.com '>请勿更改此链接< / a>更改此：http://yahoo.com foo.com< / body>< / html>; Document d = Jsoup.parse（text）; String newHtmlCode =; String oldHtmlCode = d.outerHtml（）; List< TextNode> textNodes = d.body（）。textNodes（）; Matcher m = Patterns.WEB_URL.matcher（）; for（TextNode textNode：textNodes）{ m.reset（textNode.text（））; String fragment =; while（m .find（））{ fragment = m.replaceAll（< a href = \\\ * \\ * \\ * $ 1 \> $ 1< / （b））; textNode.replaceWith（新元素d.outerHtml（）。replaceAll（\\\ Q *** \\E（？！https？：//），\http：//）.replaceAll（ \\\\ Q *** \\E（https？：//），\$ 1）; } System.out。 println（BEFORE：\\\ \\\ + oldHtmlCode）; System.out.println（----------------------- -----）; System.o ut.println（AFTER：\\\ \\\ + newHtmlCode）; h3> 之前： < html> < head>< / head> < body> < a href =http://google.com>不要更改此链接< / a>更改此：http://yahoo.com foo.com < / body> < / html> ---------------------------- AFTER： < html> < head>< / head> < body> < a href =http://google.com>不要更改此链接< / a> 更改此项：< a href =http://yahoo.com> http：//yahoo.com< / a> < a href =http://foo.com> foo.com< / a> < / body> < / html> I am trying to replace plain link to hyperlink in an html document. and my logic is private static final Pattern WEB_URL_PROTOCOL = Pattern.compile("(?i)http|https://"); StringBuffer sb = new StringBuffer(); if (text != null) { // Escape any inadvertent HTML in the text message text = EmailHtmlUtil.escapeCharacterToDisplay(text); // Find any embedded URL's and linkify Matcher m = Patterns.WEB_URL.matcher(text); while (m.find()) { int start = m.start(); if (start == 0 || text.charAt(start - 1) != '@') { String url = m.group(); Matcher proto = WEB_URL_PROTOCOL.matcher(url); String link; if (proto.find()) { lower case protocol link. link = proto.group().toLowerCase() + url.substring(proto.end()); } else { link = "http://" + url; } String href = String.format("<a href=\"%s\">%s</a>", link, url); m.appendReplacement(sb, href); } else { m.appendReplacement(sb, "$0"); } } m.appendTail(sb); } This code is successfully find out all links in a html doc .but problem is it also find the hyperlink.So i want to exclude the hyperlinks and want to find only plain links for example it should exclude <a href="awbs://www.google.com" target="_BLANK">https://www.google.com</a> normal address https but plain link https://www.google.com should be replaced by a hyperlink Edit if doc contain text like this - 1. https://www.yahoo.com 2. https://www.google.com normal address https so here i want to replace https://www.yahoo.com with <a href = "https://www.yahoo.com>https://www.yahoo.com</a> and it should not effect 2 at all . 解决方案 I would recommand you to use Jsoup here. Sample code String text = "<html><head></head><body><a href='http://google.com'>Don't change this link</a> Change this: http://yahoo.com foo.com</body></html>"; Document d = Jsoup.parse(text); String newHtmlCode = ""; String oldHtmlCode = d.outerHtml(); List<TextNode> textNodes = d.body().textNodes(); Matcher m = Patterns.WEB_URL.matcher(""); for (TextNode textNode : textNodes) { m.reset(textNode.text()); String fragment = ""; while (m.find()) { fragment = m.replaceAll("<a href=\"\\*\\*\\*$1\">$1</a>"); textNode.replaceWith(new Element(Tag.valueOf("span"),"").html(fragment)); } newHtmlCode = d.outerHtml().replaceAll("\"\\Q***\\E(?!https?://)", "\"http://").replaceAll("\"\\Q***\\E(https?://)", "\"$1"); } System.out.println("BEFORE:\n\n" + oldHtmlCode); System.out.println("----------------------------"); System.out.println("AFTER:\n\n" + newHtmlCode); Output BEFORE: <html> <head></head> <body> <a href="http://google.com">Don't change this link</a> Change this: http://yahoo.com foo.com </body> </html> ---------------------------- AFTER: <html> <head></head> <body> <a href="http://google.com">Don't change this link</a> Change this: <a href="http://yahoo.com">http://yahoo.com</a> <a href="http://foo.com">foo.com</a> </body> </html> 这篇关于如何用html文档中的超链接替换原始网址的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何用html文档中的超链接替换原始网址 [英] how to replace plain url with hyperlinks in html doc

问题描述

示例代码

Sample code

Output

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何用html文档中的超链接替换原始网址 [英] how to replace plain url with hyperlinks in html doc

问题描述

示例代码

Sample code

Output

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭