Ruby中的通用换行支持,包括\r(CR)行尾 [英] Universal newline support in Ruby that includes \r (CR) line endings

查看:238
本文介绍了Ruby中的通用换行支持,包括\r(CR)行尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Rails应用程式中,我接受并解析可能以三种可能的结束字元格式显示的CSV档案: \\\
LF ), \r\\\
CR + LF \r CR )。 Ruby的文件 CSV似乎处理前两种情况很好,但最后一种情况(Mac经典 \r 行结尾)不作为换行处理。重要的是能够接受这种格式以及其他格式,因为Microsoft Excel for Mac(运行在OS X上)似乎在导出到逗号分隔值时使用它(虽然导出到Windows逗号分隔产生更容易Python具有通用换行支持,并且将处理(例如:任何这三种格式没有问题。

解决方案

你可以使用 :row_sep => :auto


:row_sep

附加到每行末尾的字符串。这可以设置为特殊的:auto 设置,它要求CSV自动从数据中发现这一点。自动发现在查找下一个\r\\\
\\\
\r序列。




您也可以手动清除EOL的位置 gsub ing之前将数据转换为CSV进行解析。我可能会采取这个路由,并手动转换所有 \r\\\
s和 \r s单个 \\\
s,然后再尝试解析CSV。 OTOH,如果你的CSV中嵌入了二进制数据, \r 意味着什么,这将不会工作。在握紧的手上,这是我们正在处理的CSV,所以谁知道你会结束处理什么样的疯狂破碎废话。


In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \n (LF), \r\n (CR+LF), or \r (CR). Ruby's File and CSV libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \r line endings) isn't handled as a newline. It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \r\n).

Python has "universal newline support" and will handle any of these three formats without a problem. Is there something similar in Ruby that will accept all three without knowing the format in advance?

解决方案

You could use :row_sep => :auto:

:row_sep
The String appended to the end of each row. This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next "\r\n", "\n", or "\r" sequence.

There are some caveats of course, see the manual linked to above for details.

You could also manually clean up the EOLs with a bit of gsubing before handing the data to CSV for parsing. I'd probably take this route and manually convert all \r\ns and \rs to single \ns before attempting to parse the CSV. OTOH, this won't work that well if there is embedded binary data in your CSV where \rs mean something. On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with.

这篇关于Ruby中的通用换行支持,包括\r(CR)行尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆