jsoup在链接href中转义与号 [英] jsoup escaping ampersand in link href

查看:87
本文介绍了jsoup在链接href中转义与号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

JSoup在链接href的URL的查询部分中将&"号转义.鉴于以下示例

JSoup is escaping the ampersand in the query portion of a URL in a link href. Given the sample below

    String l_input = "<html><body>before <a href=\"http://a.b.com/ct.html\">link text</a> after</body></html>";
    org.jsoup.nodes.Document l_doc = org.jsoup.Jsoup.parse(l_input);
    org.jsoup.select.Elements l_html_links = l_doc.getElementsByTag("a");
    for (org.jsoup.nodes.Element l : l_html_links) {
      l.attr("href", "http://a.b.com/ct.html?a=111&b=222");
    }
    String l_output = l_doc.outerHtml();

输出为

    <html>
    <head></head>
    <body>
    before 
    <a href="http://a.b.com/ct.html?a=111&amp;b=222">link text</a> after
    </body>
    </html>

单曲&正在逃逸到& amp; .它不应该保留为&吗? ?

The single & is being escaped to &amp; . Shouldn't it stay as & ?

推荐答案

看来您做不到.我浏览了消息来源,找到了逃生发生的地方.

It seems you can't do it. I went through the source and found the place where the escape happens.

它在 Attribute.java

/**
 Get the HTML representation of this attribute; e.g. {@code href="index.html"}.
 @return HTML
 */
public String html() {
    return key + "=\"" + Entities.escape(value, (new Document("")).outputSettings()) + "\"";
}

您会看到它正在使用 Entities.java jsoup采用默认的outputSettings new document("");,这样您就无法覆盖此设置.

There you see it is using the Entities.java jsoup takes the default outputSettings of new document(""); That's way you can't override this settings.

也许您应该为此发布功能请求.

Maybe you should post a feature request for that.

顺便说一句:默认的转义模式设置为base.

Btw: The default Escape mode is set to base.

Documet .java 创建默认的OutputSettings对象,并创建

The Documet.java creates a default OutputSettings Objects, and there it is defined. See:

/**
 * A HTML Document.
 *
 * @author Jonathan Hedley, jonathan@hedley.net 
 */
public class Document extends Element {
    private OutputSettings outputSettings = new OutputSettings();
    // ...
}


/**
 * A Document's output settings control the form of the text() and html() methods.
 */
public static class OutputSettings implements Cloneable {
    private Entities.EscapeMode escapeMode = Entities.EscapeMode.base;
    // ...
}

解决方法(以XML转义):

使用 apache commons lang 项目中的StringEscapeUtils,您可以逃避那些轻松思考.参见:

With the StringEscapeUtils from the apache commons lang project you can escape those thinks easly. See:

    String unescapedXml = StringEscapeUtils.unescapeXml(l_output);
    System.out.println(unescapedXml);

这将打印:

<html>
 <head></head>
 <body>
  before 
  <a href="http://a.b.com/ct.html?a=111&b=222">link text</a> after
 </body>
</html>

但是,当然,它将取代所有&amp; ...

But of course, it will replace all &amp;...

这篇关于jsoup在链接href中转义与号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆