如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL? [英] How to fetch an URL containing non-ASCII characters (ą, ś ...) with Jsoup?
本文介绍了如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用jsoup解析某些波兰语站点,但是我对URL(!)中的特殊字符(例如±",ś")有疑问,例如,读取example.com/kąt
的方式类似于example.com/k
I am using jsoup to parse some polish sites, but I have problem with special characters like "ą", "ś" in URL(!), for example example.com/kąt
is readed like example.com/k
每个不带特殊字符的查询都能完美地工作
every query without this special characters works perfectly
我尝试了Document doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url)
,但是它不起作用.
I have tried Document doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url)
but it does not work.
还有其他提示吗?
推荐答案
您要在将其传递给Jsoup之前对URL进行编码.
You want to encode your URL before passing it to Jsoup.
示例代码
String url = "http://sjp.pl/maść";
System.out.println("BEFORE " + url);
String encodedURL = URI.create(url).toASCIIString();
System.out.println("AFTER " + encodedURL);
System.out.println("Title: " + Jsoup.connect(encodedURL).get().title());
输出
BEFORE http://sjp.pl/maść
AFTER http://sjp.pl/ma%C5%9B%C4%87
Title: maść - Słownik SJP
法语语言环境
Jsoup 1.8.3
French locale
Jsoup 1.8.3
这篇关于如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文