如何在python中读取大型tsv文件并将其转换为csv [英] How to read a large tsv file in python and convert it to csv

查看:83
本文介绍了如何在python中读取大型tsv文件并将其转换为csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 tsv 文件(大约 12 GB),我想将其转换为 csv 文件.对于较小的 tsv 文件,我使用以下代码,该代码有效但速度较慢:

I have a large tsv file (around 12 GB) that I want to convert to a csv file. For smaller tsv files, I use the following code, which works but is slow:

import pandas as pd

table = pd.read_table(path of tsv file, sep='\t')
table.to_csv(path andname_of csv_file, index=False)

然而,这段代码对我的大文件不起作用,内核在中间重置.

However, this code does not work for my large file, and the kernel resets in the middle.

有什么办法可以解决这个问题吗?有谁知道这个任务是否可以用 Dask 而不是 Pandas 来完成?

Is there any way to fix the problem? Does anyone know if the task is doable with Dask instead of Pandas?

我使用的是 Windows 10.

I am using windows 10.

推荐答案

不是一次将所有行加载到内存中,而是逐行读取并逐行处理:

Instead of loading all lines at once in memory, you can read line by line and process them one after another:

使用 Python 3.x:

fs=","
table = str.maketrans('\t', fs)
fName = 'hrdata.tsv'
f = open(fName,'r')

try:
  line = f.readline()
  while line:
    print(line.translate(table), end = "")
    line = f.readline()

except IOError:
  print("Could not read file: " + fName)

finally:
  f.close()

输入(hrdata.tsv):

Input (hrdata.tsv):

Name    Hire Date       Salary  Sick Days remaining
Graham Chapman  03/15/14        50000.00        10
John Cleese     06/01/15        65000.00        8
Eric Idle       05/12/14        45000.00        10
Terry Jones     11/01/13        70000.00        3
Terry Gilliam   08/12/14        48000.00        7
Michael Palin   05/23/13        66000.00        8

输出:

Name,Hire Date,Salary,Sick Days remaining
Graham Chapman,03/15/14,50000.00,10
John Cleese,06/01/15,65000.00,8
Eric Idle,05/12/14,45000.00,10
Terry Jones,11/01/13,70000.00,3
Terry Gilliam,08/12/14,48000.00,7
Michael Palin,05/23/13,66000.00,8

命令:

python tsv_csv_convertor.py > new_csv_file.csv

注意:

如果您使用 Unix 环境,只需运行以下命令:

If you use a Unix env, just run the command:

tr '\t' ',' <input.tsv >output.csv

这篇关于如何在python中读取大型tsv文件并将其转换为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆