获取Python中文本文件的换行统计 [英] Get newline stats for a text file in Python

查看:340
本文介绍了获取Python中文本文件的换行统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在git文件中遇到了讨厌的CRLF/LF冲突,该冲突可能是Windows计算机提交的.是否有跨平台的方法(最好在Python中)来检测文件中占主导地位的换行类型?

I had a nasty CRLF / LF conflict in git file that was probably committed from Windows machine. Is there a cross-platform way (preferably in Python) to detect what type of newlines is dominant through the file?

我有此代码(基于 https://stackoverflow.com/a/10562258/239247):

import sys
if not sys.argv[1:]:
  sys.exit('usage: %s <filename>' % sys.argv[0])

with open(sys.argv[1],"rb") as f:
  d = f.read()
  crlf, lfcr = d.count('\r\n'), d.count('\n\r')
  cr, lf = d.count('\r'), d.count('\n')
  print('crlf: %s' % crlf)
  print('lfcr: %s' % lfcr)
  print('cr: %s' % cr)
  print('lf: %s' % lf)
  print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
  print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
  print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))

但是它给出了错误的统计信息(对于此文件):

But it gives the stats wrong (for this file):

crlf: 1123
lfcr: 58
cr: 1123
lf: 1123

cr-crlf-lfcr: -58
lf-crlf-lfcr: -58

total (lf+cr-2*crlf-2*lfcr): -116

推荐答案

import sys


def calculate_line_endings(path):
    # order matters!
    endings = [
        b'\r\n',
        b'\n\r',
        b'\n',
        b'\r',
    ]
    counts = dict.fromkeys(endings, 0)

    with open(path, 'rb') as fp:
        for line in fp:
            for x in endings:
                if line.endswith(x):
                    counts[x] += 1
                    break
    print(counts)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        calculate_line_endings(sys.argv[1])

    sys.exit('usage: %s <filepath>' % sys.argv[0])

为您的文件提供输出

crlf: 1123
lfcr: 0
cr: 0
lf: 0

够了吗?

这篇关于获取Python中文本文件的换行统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆