正确检测Perl中文件的行尾? [英] Properly detect line-endings of a file in Perl?

查看：345 发布时间：2020/5/17 19:47:37 perl newline

本文介绍了正确检测Perl中文件的行尾?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题:我有在Windows和* nix上生成的数据(大部分为CSV格式)，并且大部分在* nix上处理过. Windows使用CRLF作为行尾，而Unix使用LF.对于任何特定的文件，我都不知道它是否具有Windows或* nix行尾.到目前为止，我一直在写这样的东西来解决差异:

Problem: I have data (mostly in CSV format) produced on both Windows and *nix, and processed mostly on *nix. Windows uses CRLF for line endings and Unix uses LF. For any particular file I don't know whether it has windows or *nix line endings. Up until now, I've been writing something like this to handle the difference:

while (<$fh>){
    tr/\r\n//d;
    my @fields = split /,/, $_;
    # ...
}

在* nix上，\ n部分等同于切碎，并且如果它是Windows生成的文件，则还去除了\ r(CR).

On *nix the \n part is equivalent to chomping, and additionally gets rid of \r (CR) if it's a windows-produced file.

但是现在我要使用Text :: CSV_XS b/c，我开始获得带有引号的数据(可能带有嵌入式换行符等)的怪异数据文件.为了使该模块读取此类文件，请使用Text: :CSV_XS :: getline()要求您指定行尾字符. (我无法读取上述每一行tr/\ n \ r//d，并且它们使用Text :: CSV b/c对其进行了解析，这无法正确处理嵌入式换行符).我如何正确地检测任意文件使用的是Windows还是* nix样式的行尾，所以我可以告诉Text :: CSV_XS :: eol()如何chomp()?

But now I want to Text::CSV_XS b/c I'm starting to get weirder data files with quoted data, potentially with embedded line-breaks, etc. In order to get this module to read such files, Text::CSV_XS::getline() requires that you specify the end-of-line characters. (I can't read each line as above, tr/\n\r//d, and them parse it with Text::CSV b/c that wouldn't handle embedded line-breaks properly). How do I properly detect whether an arbitrary file uses windows or *nix style line endings, so I can tell Text::CSV_XS::eol() how to chomp()?

我在CPAN上找不到一个仅检测行尾的模块.我不想首先通过dos2unix转换我的所有数据文件，因为文件巨大(数百GB)，并且每个文件花费10分钟以上的时间来处理如此简单的事情似乎很愚蠢.我考虑过编写一个读取文件前几百个字节并计算LF与CRLF的函数，但是我拒绝相信这没有更好的解决方案.

I couldn't find a module on CPAN that simply detects line endings. I don't want to to first convert all my datafiles via dos2unix, b/c the files are huge (hundreds of gigabytes), and spending 10+ minutes for each file to deal with something so simple seems silly. I thought about writing a function which reads the first several hundred bytes of a file and counts LF's vs CRLF's, but I refuse to believe this doesn't have a better solution.

有帮助吗?

请注意:所有文件都完全具有Windows命令行结尾或* nix结尾，即，它们不是都混在一个文件中.

Note: all files are either have entirely windows-line endings or *nix endings, ie, they are not both mixed in a single file.

正确检测Perl中文件的行尾? [英] Properly detect line-endings of a file in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

正确检测Perl中文件的行尾? [英] Properly detect line-endings of a file in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭