Jsoup - CSS选择器的查询问题(?) [英] Jsoup - CSS Query selector issue (?)

查看:349
本文介绍了Jsoup - CSS选择器的查询问题(?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I'm这里一个奇怪的问题,从来就一直使用的 Jsoup 1.7.2 的一段时间,没有问题,只是现在,当我尝试检索从本网站的主要头条 www.jornaldamarinha.pt 的,使用这种code:

I´m with an odd issue here, I´ve been using Jsoup 1.7.2 for a while, with no issues, only now, when I try to retrieve the main headlines from this website: www.jornaldamarinha.pt, using this code:

// Connecting...
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .timeout(0)
                    .get();

// "*[class*=zincontent-wrap]" in "Jsoup idiom", means:  
// Select all tags that contains classes with "zincontent-wrap" on its name.
Elements elems = doc.select("*[class*=zincontent-wrap]"); // Retrieves 0 results!

int t = elems.size();
Log.w("INFO", "Total Headlines: " + t);

// Loop trought all retrieved headlines:
for (Element e : elems) {
   String headline = e.select("a").text().toString();
   Log.w("HEADLINE", headline);
};

它失败!... 0检索结果。的(如果检索〜8)

有机会,这个问题的原因是:


  1. 外国人... (类似机器人,但丑陋......)

  2. 网站编码。的(我试图连接code传入HTML符合ISO-8859-15,处理葡萄牙的特殊字符,但问题依然存在)

  3. 玛格式的HTML进来。的(我怀疑这可能是问题,因为选择器正常工作的尝试jsoup在线网页,并Jsoup通常处理损坏的HTML非常好)

  4. 在类名( - )使用减号的与Jsoup搞乱。的(好像对我来说,是主要的(或至少,问题之一)的原因)

  5. 别的东西... (很有可能!)

  1. Aliens... (Similar to androids, but uglier...)
  2. Website encoding. (I tried to encode incoming HTML with ISO-8859-15, to handle portuguese special characters, but the issue remains)
  3. Mal-formatted incoming HTML. (I doubt this could be the issue, since the selector works fine on "Try jsoup online webpage", and Jsoup usually handles broken HTML very well)
  4. The use of the minus symbol in the class name ("-") is messing with Jsoup. (Seems, to me, to be the main (or at least, one) cause of the issue)
  5. Something else... (Very probably!)

BUT ...在 http://try.jsoup.org如果我取的网址: http://www.jornaldamarinha.pt 使用这个CSS查询:


BUT... at http://try.jsoup.org if I fetch the URL: http://www.jornaldamarinha.pt using this CSS Query:

*[class*=zincontent-wrap]

一切工作好了,有!的(检索所有〜8正确的结果!)

SO ...恢复,我需要的是做的正是网页做了什么,但使用code。

SO... to resume, all I need is to do exactly what that webpage does, but using code.

谢谢,提前,对于任何光线或解决办法,这个! :)

THANKS, in advance, for any light or workaround, about this! :)

推荐答案

解决方案!... 毕竟,一切都在上面code,被工作正常,我怀疑,除了......那CSS查询打破上Android's默认用户代理。我只是认为设置的userAgent 来Jsoup's连接方法的非常重要!所以,从来就编辑以下方式我的code和...现在就像一个魅力!的(正好与相同的结果,如 http://try.jsoup.org 网页)

SOLUTION!... After all, everything in the above code, was working correctly, as I suspected, except... That CSS Query breaks on Android´s "default user agent". I just figured that setting "userAgent" to Jsoup´s connection method is VERY important! So, I´ve edited my code on the following way, and... Works like a charm now !! (Exactly with same results, as in http://try.jsoup.org webpage)

Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .userAgent("Mozilla/5.0 Gecko/20100101 Firefox/21.0")
                    .timeout(0)
                    .get();

希望这有助于其他人呢! :)

Hope this helps anyone else too! :)

这篇关于Jsoup - CSS选择器的查询问题(?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆