RSS< link>上的Jsoup选择器标签使用.text()方法返回空字符串 [英] Jsoup selector on RSS <link> tag returns empty string with .text() method
问题描述
我正在使用 jsoup 来解析
I'm using jsoup to parse an rss feed using java. I'm having problems getting a result when trying to select the first <link>
element in the document.
当我使用title.text()
时,使用以下代码可获得预期的结果:
When I use title.text()
I get an expected result with this code:
Document doc = Jsoup.connect(BLOG_URL).get();
Element title = doc.select("rss channel title").first();
System.out.println(title.text()); // print the blog title...
但是,link.text()
的工作方式不同:
However, link.text()
doesn't work the same way:
Element link = doc.select("rss channel link").first();
System.out.println(link.text()); // prints empty string
当我检查doc.select("rss channel link")
时,会填充Element link
对象,但.println()
语句只是一个空字符串.
When I inspect doc.select("rss channel link")
the Element link
object is populated but the .println()
statement is just an empty string.
是什么使.select("rss channel link")
如此特别,以至于我不知道如何使用它?
What makes .select("rss channel link")
so dang special that I can't figure out how to use it?
RSS响应开始如下:
<?xml version="1.0" encoding="UTF-8"?>
<rss>
<channel>
<title>The Blog Title</title>
<link>http://www.the.blog/category</link>
推荐答案
您的rss feed是XML,而不是HTML.为此,必须告诉JSoup使用其XMLParser.这将起作用:
Your rss feed is XML, not HTML. For this to work, you must tell JSoup to use its XMLParser. This will work:
String rss = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
+"<rss><channel>"
+ "<title>The Blog Title</title>"
+ "<link>http://www.the.blog/category</link>"
+"</channel></rss>";
Document doc = Jsoup.parse(rss, "", Parser.xmlParser());
Element link = doc.select("rss channel link").first();
System.out.println(link.text()); // prints empty string
说明:
HTML中的链接标记采用不同的格式,Jsoup尝试将rss的<link>
解释为此类html标记.
The link tag in HTML follows a different format and Jsoup tries to interpret the <link>
of your rss as such html tag.
这篇关于RSS< link>上的Jsoup选择器标签使用.text()方法返回空字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!