KRL RSS解析器:处理编码问题? [英] KRL RSS parser: Handle encoding issues?
问题描述
我正在将一个RSS Feed从Tumblr导入Kynetx应用程序。看来,RSS源有一些编码问题,因为撇号如下所示:
I'm importing an RSS feed from Tumblr into a Kynetx app. It appears that the RSS feed has some encoding issues, as apostrophes appear like this:
The feed (which you can find here) claims to be encoded in UTF-8.
是否有一种方法可以指定编码,或者替换这些字符(例如:rel =nofollow noreferrer> here )
Is there a way to specify the encoding or else replace those characters with regular apostrophes?
推荐答案
虽然不是最优的,但是可以尝试捕获这些编码并将其替换为UTF-8标准:
While not optimal, you could try to catch these encodings and replace them with the UTF-8 standard:
newstring = oldstring.replace(re/’/\'/);
这似乎是一个指定UTF-8但不明确强制执行的服务。我上传了您提供的RSS Feed的图片。为了比较,我将文本剪切并粘贴到记事本文档中,然后从键盘输入相同的文本。
This appears to be a case of a service that specifies UTF-8, but does't explicitly enforce it. I uploaded an image of the RSS feed that you provided. For comparison, I cut and pasted the text into a notepad document and then typed in the same text from my keyboard.
我不知道你能从图像,但是被篡改的撇号不同于我的UTF-8浏览器生成的撇号。
I don't know if you can tell from the image, but the apostrophe that is mangled is different from the apostrophe that is generated by my UTF-8 browser.
我怀疑这篇文章是通过Windows客户端提交的。如果您查看编码选项,您会看到一个西方选项( Windows-1252 )。
I suspect that this post was submitted via a Windows client. If you look at your encoding options, you will see an option for Western (Windows-1252).
Windows-1252是来自类似于ISO 8859-1的Windows的传统编码,但用ANSI标准中的控制字符代替一些自己的字符,并更改位置
Windows-1252 is a legacy encoding from windows that resembles ISO 8859-1, but substitutes some of their own characters for control characters in the ANSI standard and changes the location in the codepage of others.
上面列举的维基百科页面的几个引号:
A couple of quotes from the wikipedia page that I cite above:
这是很常见的错误标签Windows-1252文本数据与字符集标签ISO-8859-1。许多网络浏览器和电子邮件客户端将MIME字符集ISO-8859-1视为Windows-1252字符,以适应这种错误标记。
It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling
许多Microsoft程序,如Word当输入标准ASCII字符时,例如智能引号(例如,用缩写中的撇号替换)或用三个字符(c)替换。来自动替换Windows-1252字符。
Many Microsoft programs, such as Word will automatically substitute Windows-1252 characters when standard ASCII characters are entered, such as for "smart quotes" (e.g. substituting ’ for the apostrophe in a contraction) or substituting © for the three characters '(c)'.
KRL支持UTF-8支持的所有语言字符集,因此它支持多字节国际字符;但是,这是以能够抹掉编码的代价是可能的,当你只有ISO-8859-1或Windows-1252可供选择。
KRL supports all of the language charsets supported by UTF-8, so it supports multi-byte international characters natively; however, that comes at the expense of being able to fudge encodings that is possible when you only have ISO-8859-1 or Windows-1252 to choose from.
这篇关于KRL RSS解析器:处理编码问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!