使用os.system()或subprocess.call()运行的sed命令不带分隔符的csv文件 [英] sed command run using os.system() or subprocess.call() leaves csv file without a delimiter

查看:228
本文介绍了使用os.system()或subprocess.call()运行的sed命令不带分隔符的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个Python脚本,该脚本从Postgres数据库中提取CSV的转储,然后我想在所有这些文件中转义双引号.因此,我正在使用sed进行操作.
在我的Python代码中:

I am running a Python script which takes the dump of CSVs from a Postgres database and then I want to escape double quotes in all these files. So I am using sed to do so.
In my Python code:

sed_for_quotes = 'sed -i s/\\"//g /home/ubuntu/PTOR/csvdata1/'+table+'.csv'  
subprocess.call(sed_for_quotes, shell=True)  

该过程完成,没有任何错误,但是当我将这些表加载到Redshift时,出现错误No delimiter found,并且在检查CSV时,我发现其中一行仅被加载了一半,例如,如果它是timestamp列,则仅加载其中的一半,表中之后没有数据(而实际CSV在运行sed之前具有该数据).这会导致No delimiter found错误.

The process completes without any error, but when I load these tables to Redshift, I get error No delimiter found and upon checking the CSV, I find that one of the rows is only half-loaded,for example if it is a timestamp column, then only half of it is loaded, and there is no data after that in the table (while the actual CSV has that data before running sed). And that leads to the No delimiter found error.

但是,当我在 shell 中的这些文件上运行sed -i s/\"//g filename.csv时,它运行良好,并且在运行sed之后的csv具有所有行.我检查了文件中的数据是否没有问题.

But when I run sed -i s/\"//g filename.csvon these files in the shell it works fine and the csv after running sed has all the rows. I have checked that there is no problem with the data in the files.

这在Python程序中不起作用的原因是什么?我也尝试在python程序中使用sed -i.bak,但这没什么区别.

What is the reason for this not working in a Python program ? I have also tried using sed -i.bak in the python program but that makes no difference.

请注意,因为我需要转义其他反斜杠,所以我在Python代码中使用了额外的反斜杠(\).
尝试了其他方法:

Please Note that I am using an extra backslash(\) in the Python code because I need to escape the other backslash.
Other approaches tried:

  • 使用subprocess.Popen而不使用任何缓冲区大小并且缓冲区大小为正,但这没有帮助
  • 使用subprocess.Popen(sed_for_quotes,bufsize=-4096)(负缓冲区大小)可解决以下问题之一 给出错误的文件,但随后遇到相同的文件 问题在另一个文件中.
  • Using subprocess.Popen without any buffer size and with positive buffer size, but that didn't help
  • Using subprocess.Popen(sed_for_quotes,bufsize=-4096) (negative buffer size) worked for one of the files which was giving the error, but then encountered the same problem in another file.

推荐答案

在不需要时不要使用中间外壳.并检查子流程的返回码,以确保其成功完成(check_call为您完成此操作)

Do not use intermediate shell when you do not need to. And check for return code of the subprocess to make sure it completed successfully (check_call does this for you)

path_to_file = ... # e.g. '/home/ubuntu/PTOR/csvdata1/' + table + '.csv'
subprocess.check_call(['sed', '-i', 's/"//g', path_to_file])

通过中间" shell,我的意思是由subprocess运行的shell进程解析命令(±不仅由空格分割,而且还由空格分割)并运行它(在此示例中运行sed).由于您确切地知道应使用什么参数sed,因此您不需要所有这些,最好避免这种情况.

By "intermediate" shell I mean the shell process run by subprocess that parses the command (± splits by whitespace but not only) and runs it (runs sed in this example). Since you precisely know what arguments sed should be invoked with, you do not need all this and it's best to avoid that.

这篇关于使用os.system()或subprocess.call()运行的sed命令不带分隔符的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆