如何使用Python打开和处理存储在Google云端存储中的CSV文件 [英] How to open and process CSV file stored in Google Cloud Storage using Python

查看：486 发布时间：2018/5/3 18:49:20 python google-app-engine google-cloud-storage

本文介绍了如何使用Python打开和处理存储在Google云端存储中的CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Google云端存储客户端库。

我试图打开并处理一个CSV文件（已上传到存储区）：

filename ='/<my_bucket/data.csv' with gcs.open（filename，'r '）as gcs_file： csv_reader = csv.reader（gcs_file，delimiter ='，'，quotechar ='''）
响应csv.reader的第一个参数（即gcs_file），我得到错误argument 1 must be a iterator。显然gcs_file不支持迭代器.next方法。 / p>

有关如何继续的想法？是否需要包装gcs_file并在其上创建迭代器或者有更简单的方法？

解决方案
我认为你最好有自己的wrapper / iterator设计用于csv.reader。如果gcs_file支持 Iterator 协议，我t不清楚next（）应该返回到什么总是容纳其消费者。

根据csv reader doc，它会返回一个reader对象，它将会迭代给定csvfile中的行。 csvfile可以是任何支持迭代器协议的对象，每次调用next（）方法时都会返回一个字符串 - 文件对象和列表对象都适用。如果csvfile是一个文件对象，那么它必须在平台上用b标志打开，这是有所作为的。

它需要来自底层文件的原始字节块，不一定是一行。你可以有一个这样的包装器（未测试）：

pre codelass CsvIterator $ b $ def __init __（self， gcs_file，chunk_size）：
self.gcs_file = gcs_file
self.chunk_size = chunk_size
def __iter __（self）：
return self
def next（self）：
result = self.gcs_file.read（size = self.chunk_size）
如果不是结果：
提示StopIteration（）
返回结果

关键是一次读取一个块，这样当你有一个大文件时，你不会炸掉内存或者经历urlfetch的超时。

甚至更简单。要使用内置的 iter ：

  csv.reader（iter（gcs_file.readline，''））

I am using the Google Cloud Storage Client Library.

I am trying to open and process a CSV file (that was already uploaded to a bucket) using code like:

filename = '/<my_bucket/data.csv'
with gcs.open(filename, 'r') as gcs_file:
    csv_reader = csv.reader(gcs_file, delimiter=',', quotechar='"')

I get the error "argument 1 must be an iterator" in response to the first argument to csv.reader (i.e. the gcs_file). Apparently the gcs_file doesn't support the iterator .next method.

Any ideas on how to proceed? Do I need to wrap the gcs_file and create an iterator on it or is there an easier way?

解决方案

I think it's better you have your own wrapper/iterator designed for csv.reader. If gcs_file was to support Iterator protocol, it is not clear what next() should return to always accommodate its consumer.

According to csv reader doc, it

Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable. If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

It expects a chunk of raw bytes from the underlying file, not necessarily a line. You can have a wrapper like this (not tested):

class CsvIterator(object)
  def __init__(self, gcs_file, chunk_size):
     self.gcs_file = gcs_file
     self.chunk_size = chunk_size
  def __iter__(self):
     return self
  def next(self):
     result = self.gcs_file.read(size=self.chunk_size)
     if not result:
        raise StopIteration()
     return result

The key is to read a chunk at a time so that when you have a large file, you don't blow up memory or experience timeout from urlfetch.

Or even simpler. To use iter built in:

csv.reader(iter(gcs_file.readline, ''))

这篇关于如何使用Python打开和处理存储在Google云端存储中的CSV文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Python打开和处理存储在Google云端存储中的CSV文件 [英] How to open and process CSV file stored in Google Cloud Storage using Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Python打开和处理存储在Google云端存储中的CSV文件 [英] How to open and process CSV file stored in Google Cloud Storage using Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭