清理ruby中的奇怪编码 [英] clean up strange encoding in ruby

查看：244 发布时间：2016/11/19 14:59:27 ruby json encoding character-encoding couchdb

本文介绍了清理ruby中的奇怪编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用couchdb播放。

我试图将一些博客数据从redis（键值存储）迁移到couchdb（键值存储）。

看到我可能将这个数据迁移到不同的博客引擎（每个人都有一个爱好:)一个gazillion的时间，似乎有一些编码snafus。

我使用CouchREST从ruby访问CouchDB我得到这个：

 < JSON :: GeneratorError：source sequence is illegal / malformed>

问题似乎是对象的body_html部分：

 < Post：0x00000000e9ee18 @body_html =[...] Wie Sie bereits wissen，m\xF6chte EUserv k\xFCnftig seine [... ]

这些应该是Umlauts（möchte和künftig

任何想法如何摆脱这些问题？我尝试一些转换使用ruby 1.9编码功能或iconv插入前，但还没有运气：（

如果我试图使用ruby 1.9的.encode（）方法将这些东西转换为ISO-8859-1，这是发生了什么（不同的文本，同样的问题） p>

 ＃< Encoding :: UndefinedConversionError：\xC6\x92from UTF-8 to ISO-8859-1>

解决方案

那个东西到ISO-8859-1

关闭，你实际上想做另一种方式：已获得 ISO-8859-1（*），您可以使用UTF-8（**）。因此， str.encode（'utf-8'，'iso-8859-1'）将更有可能做到这一点。

*：实际上你可能有Windows代码页1252，这就像ISO-8859-1，但额外的智能报价和事物在范围0x80-0x9F ISO-8859-1用于控制代码。如果是，请改用'cp1252'。

**： >做。使用UTF-8是最好的方法，所以你可以存储所有可能的字符。如果你想要继续在ISO-8859-1 / cp1252中工作，那么大概的问题只是Ruby猜测使用中的字符集，你可以通过调用 str.force_encoding（'iso-8859-1'）。

I'm currently playing a bit with couchdb.
I'm trying to migrate some blog data from redis (key value store) to couchdb (key value store).
Seeing as I probably migrated this data a gazillion times from and to different blogging engines (everybody has got to have a hobby :) ), there seem to be some encoding snafus.
I'm using CouchREST to access CouchDB from ruby and I'm getting this:

<JSON::GeneratorError: source sequence is illegal/malformed>

the problem seems to be the body_html part of the object:

<Post:0x00000000e9ee18 @body_html="[.....]Wie Sie bereits wissen, m\xF6chte EUserv k\xFCnftig seine  [...]

Those are supposed to be Umlauts ("möchte" and "künftig").

Any idea how to get rid of those problems? I tried some conversions using the ruby 1.9 encoding feature or iconv before inserting, but haven't got any luck yet :(

If I try to e.g. convert that stuff to ISO-8859-1 using the .encode() method of ruby 1.9, this is what happens (different text, same problem):

#<Encoding::UndefinedConversionError: "\xC6\x92" from UTF-8 to ISO-8859-1>

解决方案

I try to e.g. convert that stuff to ISO-8859-1

Close. You actually want to do it the other way around: you've got ISO-8859-1(*), you want UTF-8(**). So str.encode('utf-8', 'iso-8859-1') would be more likely to do the trick.

*: actually you might well have Windows code page 1252, which is like ISO-8859-1, but with extra smart-quotes and things in the range 0x80-0x9F which ISO-8859-1 uses for control codes. If so, use 'cp1252' instead.

**: well, you probably do. Working with UTF-8 is the best way forward so you can store all possible characters. If you really want to keep working in ISO-8859-1/cp1252, then presumably the problem is just that Ruby has mis-guessed the character set in use and you can fix it by calling str.force_encoding('iso-8859-1').

这篇关于清理ruby中的奇怪编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

清理ruby中的奇怪编码 [英] clean up strange encoding in ruby

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

清理ruby中的奇怪编码 [英] clean up strange encoding in ruby

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭