如何有效地在python中编写csv? [英] How to write csv in python efficiently?

查看:170
本文介绍了如何有效地在python中编写csv?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计算tf-idf在一个大文件。我有八万多字。我试图在csv文件中写稀疏矩阵。我使用类似于此处回答的代码如何使用Python向CSV文件中添加新列?

I am working on calculating tf-idf in a large document. The number of words I have is more than 80,000. I am trying to write sparse matrix in a csv file. I am using code similar to answered here How to add a new column to a CSV file using Python?

输出文件太大,超过700 MB只有大约30,000字。
所以,我的问题是如何有效地写它?
谢谢。

The output file is too big in size, exceeding 700 MB for about 30,000 words only. So, my question is how to write it efficiently? Thank you.

推荐答案

CSV 是CSV,没有太多可以做的事。您可以直接 gzip ,如果您真的想要

CSV is CSV and there is not much you can do about it. You can simply gzip it, if you really want to stick with CSV, or you can use some custom format that better fits your needs.

例如,您可以使用字典并导出到 JSON 格式,或者创建一个专用对象来处理您的数据和 pickle

For example you can use a dictionary and export to JSON format, or create a dedicated object that handles your data and pickle it.

当我使用TF-IDF时,我使用 sqlite (通过 sqlalchemy )存储文档信息,将TF数据作为JSON格式的字典。从那时起,我创建了IDF统计信息,之后使用 numpy

When I worked with TF-IDF, I used sqlite (via sqlalchemy) to store documents information, with TF data as dictionary in JSON format. From that I created IDF stats, and later did rest of TFIDF, using numpy

这篇关于如何有效地在python中编写csv?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆