检测换行符的类型 [英] Detecting type of line breaks

查看:39
本文介绍了检测换行符的类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在JavaScript中确定文本中使用的换行符类型的最有效方式(足够快速和可靠)是什么-Unix vs Windows.

What would be the most efficient (fast and reliable enough) way in JavaScript to determine the type of line breaks used in a text - Unix vs Windows.

在我的Node应用程序中,我必须读取大型utf-8文本文件,然后根据它们使用Unix还是Windows换行符对其进行处理.

In my Node app I have to read in large utf-8 text files and then process them based on whether they use Unix or Windows line breaks.

当断线的类型变得不确定时,我想根据当时最有可能断线的情况来得出结论.

When the type of line breaks comes up as uncertain, I want to conclude based on which one it is most likely then.

更新

根据我下面的回答,我最终使用的代码.

推荐答案

最后,基于简单的统计信息,我使用了自己的解决方案:

In the end I used my own solution for this, based on simple statistics:

const {EOL} = require('os');

function getEOL(text) {
    const m = text.match(/\r\n|\n/g);
    const u = m && m.filter(a => a === '\n').length;
    const w = m && m.length - u;
    if (u === w) {
        return EOL; // use the OS default
    }
    return u > w ? '\n' : '\r\n';
}

如果没有换行符,或者它们的数量突然相等,它将返回操作系统的默认EOL.

When there are no line breaks, or their number suddenly equal, it will return the OS's default EOL.

更新

后来,我通过进一步的实践发现,如果您想以相同的方式处理文本,而不管它是使用Unix还是Windows编码,那么最有效的方法是简单地用Unix替换所有可能的Windows编码.一个,根本不用理会任何验证:

Later on I found out through further practice, that if you want to process text in the same way, regardless of whether it has Unix or Windows encoding, then the most efficient approach is to simply replace any possible Windows encoding with the Unix one, and not bother with any verification at all:

text = text.replace(/\r\n/g, '\n'); // replace every \r\n with \n

这篇关于检测换行符的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆