获取Python中文本文件的换行统计 [英] Get newline stats for a text file in Python
本文介绍了获取Python中文本文件的换行统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在git文件中遇到了讨厌的CRLF/LF冲突,该冲突可能是Windows计算机提交的.是否有跨平台的方法(最好在Python中)来检测文件中占主导地位的换行类型?
I had a nasty CRLF / LF conflict in git file that was probably committed from Windows machine. Is there a cross-platform way (preferably in Python) to detect what type of newlines is dominant through the file?
我有此代码(基于 https://stackoverflow.com/a/10562258/239247):
import sys
if not sys.argv[1:]:
sys.exit('usage: %s <filename>' % sys.argv[0])
with open(sys.argv[1],"rb") as f:
d = f.read()
crlf, lfcr = d.count('\r\n'), d.count('\n\r')
cr, lf = d.count('\r'), d.count('\n')
print('crlf: %s' % crlf)
print('lfcr: %s' % lfcr)
print('cr: %s' % cr)
print('lf: %s' % lf)
print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))
但是它给出了错误的统计信息(对于此文件):
But it gives the stats wrong (for this file):
crlf: 1123
lfcr: 58
cr: 1123
lf: 1123
cr-crlf-lfcr: -58
lf-crlf-lfcr: -58
total (lf+cr-2*crlf-2*lfcr): -116
推荐答案
import sys
def calculate_line_endings(path):
# order matters!
endings = [
b'\r\n',
b'\n\r',
b'\n',
b'\r',
]
counts = dict.fromkeys(endings, 0)
with open(path, 'rb') as fp:
for line in fp:
for x in endings:
if line.endswith(x):
counts[x] += 1
break
print(counts)
if __name__ == '__main__':
if len(sys.argv) == 2:
calculate_line_endings(sys.argv[1])
sys.exit('usage: %s <filepath>' % sys.argv[0])
为您的文件提供输出
crlf: 1123
lfcr: 0
cr: 0
lf: 0
够了吗?
这篇关于获取Python中文本文件的换行统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文