linux curl另存为utf-8 [英] linux curl save as utf-8

查看:201
本文介绍了linux curl另存为utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用linux curl从网址下载xml文件。



确定xml是以UTF-8编码的,



怀疑curl -o不保存为UTF-8。



有没有用curl强制保存到UTF-8?

因为xml feed是动态的,不是所有的时候它包含任何utf-8字符。
有时它在整个内容中没有utf-8字符,即使它在xml编码和标题内容类型中设置为utf-8:charset = utf-8。当它至少包含一个utf-8字符时,它将保存为utf-8。



当发生这种情况时,curl不会下载为utf-8,这是有意义的,因为没有utf-8字符,为什么需要存储为utf -8。



这是一个棘手的问题,一些验证器必须对utf-8有效,因此我仍然需要一个解决方案强制它utf8,因为默认情况下所有我的xml shld在utf8 -encoding。



尝试建议使用iconv f iso8859-1 utf-8不工作在这种情况下,因为我怀疑它不是在iso8859-1。 / p>

仍然需要更好的解决方案。

解决方案

curl转换其下载的文件。如果HTTP服务器为您提供另一个编码(例如ISO8859-1)的XML,那么他的curl如何将它保存到磁盘。



要解决您的问题,您可以使用iconv如下:

  curl URL | iconv -f iso8859-1 -t utf-8> output.xml 

希望有帮助。


Trying to use linux curl to download an xml file from an url.

Pretty sure that the xml is encoded in UTF-8,

suspecting curl -o doesnt save as UTF-8.

Is there anyway to force save to UTF-8 with curl ?


Thanks for the suggestion, what i found out:

Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.

When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.

This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.

tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.

Still need a better solution.

解决方案

curl does not do any conversion of the file it downloads. If the HTTP server serves you the XML in another encoding (e.g., ISO8859-1) that his how curl will save it to disk too.

To workaround your problem, you can use "iconv" as follows:

curl URL | iconv -f iso8859-1 -t utf-8 > output.xml

Hope this help.

这篇关于linux curl另存为utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆