解码URL Unix/Bash命令行(不使用sed) [英] Decode URL Unix/Bash Command Line (without sed)
问题描述
我正在用curl抓取一个网站并解析出我需要的东西.
I am scraping a website with curl and parsing out what I need.
URL返回带有Ascii编码的字符,例如
The URLs are returned with Ascii encoded characters like
GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1
GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1
如何直接从命令行将其转换为UTF-8(字符)(理想情况下,我可以通过管道将|
传递给),以便结果是...
How can I convert this to UTF-8 (char) directly from the command line (ideally something I can pipe |
to) so that the result is...
GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1
sed
有很多解决方案,但是随之而来的正则表达式却很难看.由于提供的利用perl的答案非常干净,我希望我们可以将这个问题保留为开放状态
There are a number of solutions with sed
but the regex that goes along with it is quite ugly. Since the provided answer leveraging perl is very clean I hope we can leave this question open
推荐答案
它是 html-实体.
使用 perl 的问题进行解码:>
Decode like this using perl :
$ echo 'http://domain.tld/?fields={fieldname_of_type_Tab}' |
perl -MHTML::Entities -pe 'decode_entities($_)'
输出:
http://domain.tld/?fields={fieldname_of_type_Tab}
这篇关于解码URL Unix/Bash命令行(不使用sed)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!