如何在 Windows 机器上用 Python 将 CRLF 转换为 LF [英] How to convert CRLF to LF on a Windows machine in Python

查看:302
本文介绍了如何在 Windows 机器上用 Python 将 CRLF 转换为 LF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我得到了那些模板,它们都以 LF 结尾,我可以用格式填充一些术语,并且仍然可以通过使用 "wb" 打开来获得 LF 文件.

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".

这些模板在 Windows 机器上的部署脚本中用于在 unix 服务器上进行部署.

Those templates are used in a deployment script on a windows machine to deploy on a unix server.

问题是,很多人会弄乱这些模板,我 100% 肯定他们中的一些人会在里面放一些 CRLF.

Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.

如何使用 Python 将所有 CRLF 转换为 LF?

How could I, using Python, convert all the CRLF to LF?

谢谢.

好吧,我的错,我的代码中有一个错误,在 "wb" 中打开总是将 LF 放在行尾,即使文件之前使用了 CRLF.

Well, my bad, I had a bug in my code, opening in "wb" always put LF at the end of the lines even if the file was using CRLF before.

如果您想知道,这是我正在使用的代码:

Here is the code I'm using if you are wondering:

#!/usr/bin/env python
# --*-- encoding: iso-8859-1 --*--

import string

def formatFile(templatePath, filledFilePath, params, target):
    openingMode = 'w'
    if target == 'linux':
        openingMode += 'b'

    with open(templatePath, 'r') as infile, open(filledFilePath, openingMode) as outfile:
        for line in infile:
            template = string.Template(line.decode('UTF-8'))
            outfile.write(template.substitute(**params).encode('UTF-8'))

所以没问题,一切正常:x

So no problem, everything is working fine :x

推荐答案

就地转换行尾(使用 Python 3)

Windows 到 Linux/Unix

这里有一个简短的脚本,用于将 Windows 行尾(\r\n 也称为 CRLF)直接转换为 Linux/Unix 行尾(\n 也称为 LF)就地(无需创建额外的输出文件):

Convert line endings in-place (with Python 3)

Windows to Linux/Unix

Here is a short script for directly converting Windows line endings (\r\n also called CRLF) to Linux/Unix line endings (\n also called LF) in-place (without creating an extra output file):

# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'

# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"

with open(file_path, 'rb') as open_file:
    content = open_file.read()
    
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)

with open(file_path, 'wb') as open_file:
    open_file.write(content)

Linux/Unix 到 Windows

只需在 str.replace() 调用中交换行结尾的常量,如下所示:content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING).

Linux/Unix to Windows

Just swap the constants for the line endings in the str.replace() call like so: content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING).

重要提示:我们需要确保以二进制模式(mode='rb'mode='wb') 以便转换工作.

Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.

当以文本模式(mode='r'mode='w' 没有 b)打开文件时,平台的本机行结尾(Windows 上的 \r\n 和旧 Mac OS 版本上的 \r)会自动转换为 Python 的 Unix 风格的行结尾:\n.因此对 content.replace() 的调用找不到任何要替换的 \r\n 行结尾.

When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any \r\n line endings to replace.

在二进制模式下,不进行此类转换.因此对 str.replace() 的调用可以完成它的工作.

In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.

在 Python 3 中,如果没有另外声明,字符串将存储为 Unicode (UTF-8).但是我们以二进制模式打开文件 - 因此我们需要在替换字符串前添加 b 以告诉 Python 也将这些字符串作为二进制处理.

In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.

在 Windows 上,路径分隔符是一个反斜杠 \,我们需要用 \\ 在普通的 Python 字符串中转义它.通过在字符串前面添加 r,我们创建了一个所谓的原始字符串".这不需要任何转义.因此,您可以直接将 Windows 资源管理器中的路径复制/粘贴到您的脚本中.

On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.

(提示:在 Windows 资源管理器中按 CTRL+L 自动从地址栏中选择路径.)

(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)

我们打开文件两次以避免需要重新定位文件指针.我们也可以使用 mode='rb+' 打开文件一次,但随后我们需要在读取其内容后将指针移回开始 (open_file.seek(0)code>) 并在编写新内容之前截断其原始内容 (open_file.truncate(0)).

We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

只需在写入模式下再次打开文件即可自动完成.

Simply opening the file again in write mode does that automatically for us.

干杯和快乐的编程,
温克勒尔

Cheers and happy programming,
winklerrr

这篇关于如何在 Windows 机器上用 Python 将 CRLF 转换为 LF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆