如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL? [英] How to fetch an URL containing non-ASCII characters (ą, ś ...) with Jsoup?

查看:98
本文介绍了如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用jsoup解析某些波兰语站点,但是我对URL(!)中的特殊字符(例如±",ś")有疑问,例如,读取example.com/kąt的方式类似于example.com/k

I am using jsoup to parse some polish sites, but I have problem with special characters like "ą", "ś" in URL(!), for example example.com/kąt is readed like example.com/k

每个不带特殊字符的查询都能完美地工作

every query without this special characters works perfectly

我尝试了Document doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url),但是它不起作用.

I have tried Document doc = Jsoup.parse(new URL(url).openStream(), "ISO-8859-1", url) but it does not work.

还有其他提示吗?

推荐答案

您要在将其传递给Jsoup之前对URL进行编码.

You want to encode your URL before passing it to Jsoup.

示例代码

String url = "http://sjp.pl/maść";       
System.out.println("BEFORE " + url);

String encodedURL = URI.create(url).toASCIIString();
System.out.println("AFTER " + encodedURL);

System.out.println("Title: " + Jsoup.connect(encodedURL).get().title());

输出

 BEFORE http://sjp.pl/maść
 AFTER http://sjp.pl/ma%C5%9B%C4%87
 Title: maść - Słownik SJP

法语语言环境
Jsoup 1.8.3

French locale
Jsoup 1.8.3

这篇关于如何使用Jsoup提取包含非ASCII字符(±,ś...)的URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆