linux curl另存为utf-8 [英] linux curl save as utf-8
问题描述
尝试使用linux curl从网址下载xml文件。
确定xml是以UTF-8编码的,
怀疑curl -o不保存为UTF-8。
有没有用curl强制保存到UTF-8?感谢您的建议,我发现:
因为xml feed是动态的,不是所有的时候它包含任何utf-8字符。
有时它在整个内容中没有utf-8字符,即使它在xml编码和标题内容类型中设置为utf-8:charset = utf-8。当它至少包含一个utf-8字符时,它将保存为utf-8。
当发生这种情况时,curl不会下载为utf-8,这是有意义的,因为没有utf-8字符,为什么需要存储为utf -8。
这是一个棘手的问题,一些验证器必须对utf-8有效,因此我仍然需要一个解决方案强制它utf8,因为默认情况下所有我的xml shld在utf8 -encoding。
尝试建议使用iconv f iso8859-1 utf-8不工作在这种情况下,因为我怀疑它不是在iso8859-1。 / p>
仍然需要更好的解决方案。
curl转换其下载的文件。如果HTTP服务器为您提供另一个编码(例如ISO8859-1)的XML,那么他的curl如何将它保存到磁盘。
要解决您的问题,您可以使用iconv如下:
curl URL | iconv -f iso8859-1 -t utf-8> output.xml
希望有帮助。
Trying to use linux curl to download an xml file from an url.
Pretty sure that the xml is encoded in UTF-8,
suspecting curl -o doesnt save as UTF-8.
Is there anyway to force save to UTF-8 with curl ?
Thanks for the suggestion, what i found out:
Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.
When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.
This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.
tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.
Still need a better solution.
curl does not do any conversion of the file it downloads. If the HTTP server serves you the XML in another encoding (e.g., ISO8859-1) that his how curl will save it to disk too.
To workaround your problem, you can use "iconv" as follows:
curl URL | iconv -f iso8859-1 -t utf-8 > output.xml
Hope this help.
这篇关于linux curl另存为utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!