清理ruby中的奇怪编码 [英] clean up strange encoding in ruby

查看:244
本文介绍了清理ruby中的奇怪编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用couchdb播放。

我试图将一些博客数据从redis(键值存储)迁移到couchdb(键值存储)。

看到我可能将这个数据迁移到不同的博客引擎(每个人都有一个爱好:)一个gazillion的时间,似乎有一些编码snafus。

我使用CouchREST从ruby访问CouchDB我得到这个:

 < JSON :: GeneratorError:source sequence is illegal / malformed> 

问题似乎是对象的body_html部分:

 < Post:0x00000000e9ee18 @body_html =[...] Wie Sie bereits wissen,m\xF6chte EUserv k\xFCnftig seine [... ] 

这些应该是Umlauts(möchte和künftig

任何想法如何摆脱这些问题?我尝试一些转换使用ruby 1.9编码功能或iconv插入前,但还没有运气:(



如果我试图使用ruby 1.9的.encode()方法将这些东西转换为ISO-8859-1,这是发生了什么(不同的文本,同样的问题) p>

 #< Encoding :: UndefinedConversionError:\xC6\x92from UTF-8 to ISO-8859-1> 


解决方案


那个东西到ISO-8859-1


关闭,你实际上想做另一种方式:已获得 ISO-8859-1(*),您可以 使用UTF-8(**)。因此, str.encode('utf-8','iso-8859-1')将更有可能做到这一点。



*:实际上你可能有Windows代码页1252,这就像ISO-8859-1,但额外的智能报价和事物在范围0x80-0x9F ISO-8859-1用于控制代码。如果是,请改用'cp1252'



**: >做。使用UTF-8是最好的方法,所以你可以存储所有可能的字符。如果你想要继续在ISO-8859-1 / cp1252中工作,那么大概的问题只是Ruby猜测使用中的字符集,你可以通过调用 str.force_encoding('iso-8859-1')


I'm currently playing a bit with couchdb.
I'm trying to migrate some blog data from redis (key value store) to couchdb (key value store).
Seeing as I probably migrated this data a gazillion times from and to different blogging engines (everybody has got to have a hobby :) ), there seem to be some encoding snafus.
I'm using CouchREST to access CouchDB from ruby and I'm getting this:

<JSON::GeneratorError: source sequence is illegal/malformed>

the problem seems to be the body_html part of the object:

<Post:0x00000000e9ee18 @body_html="[.....]Wie Sie bereits wissen, m\xF6chte EUserv k\xFCnftig seine  [...]

Those are supposed to be Umlauts ("möchte" and "künftig").

Any idea how to get rid of those problems? I tried some conversions using the ruby 1.9 encoding feature or iconv before inserting, but haven't got any luck yet :(

If I try to e.g. convert that stuff to ISO-8859-1 using the .encode() method of ruby 1.9, this is what happens (different text, same problem):

#<Encoding::UndefinedConversionError: "\xC6\x92" from UTF-8 to ISO-8859-1>

解决方案

I try to e.g. convert that stuff to ISO-8859-1

Close. You actually want to do it the other way around: you've got ISO-8859-1(*), you want UTF-8(**). So str.encode('utf-8', 'iso-8859-1') would be more likely to do the trick.

*: actually you might well have Windows code page 1252, which is like ISO-8859-1, but with extra smart-quotes and things in the range 0x80-0x9F which ISO-8859-1 uses for control codes. If so, use 'cp1252' instead.

**: well, you probably do. Working with UTF-8 is the best way forward so you can store all possible characters. If you really want to keep working in ISO-8859-1/cp1252, then presumably the problem is just that Ruby has mis-guessed the character set in use and you can fix it by calling str.force_encoding('iso-8859-1').

这篇关于清理ruby中的奇怪编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆