KRL RSS解析器:处理编码问题? [英] KRL RSS parser: Handle encoding issues?

查看:111
本文介绍了KRL RSS解析器:处理编码问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将一个RSS Feed从Tumblr导入Kynetx应用程序。看来,RSS源有一些编码问题,因为撇号如下所示:

I'm importing an RSS feed from Tumblr into a Kynetx app. It appears that the RSS feed has some encoding issues, as apostrophes appear like this:

Feed(您可以找到

The feed (which you can find here) claims to be encoded in UTF-8.

是否有一种方法可以指定编码,或者替换这些字符(例如:rel =nofollow noreferrer> here

Is there a way to specify the encoding or else replace those characters with regular apostrophes?

推荐答案

虽然不是最优的,但是可以尝试捕获这些编码并将其替换为UTF-8标准:

While not optimal, you could try to catch these encodings and replace them with the UTF-8 standard:

newstring = oldstring.replace(re/’/\'/);

这似乎是一个指定UTF-8但不明确强制执行的服务。我上传了您提供的RSS Feed的图片。为了比较,我将文本剪切并粘贴到记事本文档中,然后从键盘输入相同的文本。

This appears to be a case of a service that specifies UTF-8, but does't explicitly enforce it. I uploaded an image of the RSS feed that you provided. For comparison, I cut and pasted the text into a notepad document and then typed in the same text from my keyboard.

我不知道你能从图像,但是被篡改的撇号不同于我的UTF-8浏览器生成的撇号。

I don't know if you can tell from the image, but the apostrophe that is mangled is different from the apostrophe that is generated by my UTF-8 browser.

我怀疑这篇文章是通过Windows客户端提交的。如果您查看编码选项,您会看到一个西方选项( Windows-1252 )。

I suspect that this post was submitted via a Windows client. If you look at your encoding options, you will see an option for Western (Windows-1252).

Windows-1252是来自类似于ISO 8859-1的Windows的传统编码,但用ANSI标准中的控制字符代替一些自己的字符,并更改位置

Windows-1252 is a legacy encoding from windows that resembles ISO 8859-1, but substitutes some of their own characters for control characters in the ANSI standard and changes the location in the codepage of others.

上面列举的维基百科页面的几个引号:

A couple of quotes from the wikipedia page that I cite above:


这是很常见的错误标签Windows-1252文本数据与字符集标签ISO-8859-1。许多网络浏览器和电子邮件客户端将MIME字符集ISO-8859-1视为Windows-1252字符,以适应这种错误标记。

It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling

许多Microsoft程序,如Word当输入标准ASCII字符时,例如智能引号(例如,用缩写中的撇号替换)或用三个字符(c)替换。来自动替换Windows-1252字符。

Many Microsoft programs, such as Word will automatically substitute Windows-1252 characters when standard ASCII characters are entered, such as for "smart quotes" (e.g. substituting ’ for the apostrophe in a contraction) or substituting © for the three characters '(c)'.

KRL支持UTF-8支持的所有语言字符集,因此它支持多字节国际字符;但是,这是以能够抹掉编码的代价是可能的,当你只有ISO-8859-1或Windows-1252可供选择。

KRL supports all of the language charsets supported by UTF-8, so it supports multi-byte international characters natively; however, that comes at the expense of being able to fudge encodings that is possible when you only have ISO-8859-1 or Windows-1252 to choose from.

这篇关于KRL RSS解析器:处理编码问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆