无法读取上传到谷歌云存储桶中的csv文件 [英] Unable to read csv file uploaded on google cloud storage bucket

本文介绍了无法读取上传到谷歌云存储桶中的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标 - 读取上传到谷歌云存储桶中的csv文件。
$ b 环境 - 运行Jupyter笔记本在主节点上使用SSH实例。使用Jupyter笔记本上的python尝试访问上传到谷歌云存储桶中的简单csv文件。



方法 -



第一种方法 - 编写一个简单的Python程序



编写以下程序

  import csv 
f = open('gs://python_test_hm/train.csv','rb')
csv_f = csv.reader(f)
for csv_f
打印行

结果 - 错误信息No such file or directory



第二种方法 - 使用gcloud包试图访问train.csv文件。示例代码如下所示。下面的代码不是实际的代码。在我的代码版本的谷歌云存储文件被称为gs:///Filename.csv
结果 - 错误消息没有这样的文件或目录



从CSV加载数据

 从gcloud导入csv 
从gcloud.bigquery导入bigquery
导入SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create()#API请求

SCHEMA = [
SchemaField('full_name','STRING',mode ='required'),
SchemaField('age','INTEGER',mode ='required'),
]
table = dataset.table('table_name',SCHEMA)
table.create()
$ b $打开('csv_file','rb')为可读:
表。 upload_from_file(
可读,source_format ='CSV',skip_leading_rows = 1)

  import csv 
导入urllib

url ='https://storage.cloud .google.com /<斗&克t; /train.csv'


response = urllib.urlopen(url)
cr = csv.reader(响应)
print cr

for row in cr:
print row

结果 - 上面的代码没有导致任何错误,但它显示如下所示的谷歌页面的XML内容。

  ['<!DOCTYPE html>'] $ b $我有兴趣查看火车csv文件的数据。 b ['< html lang =en>'] 
['< head>]
['< meta charset =utf-8>']
['meta content =width = 300','initial-scale = 1name =viewport>']
['meta name =google-site-verificationcontent = LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs>']
['< title>登录 - Google帐户< / title>']

有人可以对此处可能出现的错误有所了解,我如何实现目标?

非常感谢您的帮助!

解决方案

我假设您正在使用在Google云端平台(GCP)中的计算机上运行的Jupyter笔记本电脑?
如果是这种情况,那么您将已经在该机器上运行Google Cloud SDK(默认情况下)。



使用此设置,您有两个简单的选项使用Google Cloud Storage(GCS):使用 b
$ b



  from google.cloud import storage 
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv ')
blob.upload_fro m_string('this is test content!')

从GCS读取:

  from google.cloud import storage 
client = storage.Client()
bucket = client.get_bucket ('python_test_hm')
blob = storage.Blob('train.csv',bucket)
content = blob.download_as_string()


Goal - To read csv file uploaded on google cloud storage bucket.

Environment - Run Jupyter notebook using SSH instance on Master node. Using python on Jupyter notebook trying to access a simple csv file uploaded onto google cloud storage bucket.

Approaches -

1st approach - Write a simple python program

Wrote following program

import csv
f = open('gs://python_test_hm/train.csv' , 'rb' ) 
csv_f = csv.reader(f)
for row in csv_f
     print row

Results - Error message "No such file or directory"

2nd Approach - Using gcloud Package tried to access the train.csv file. The sample code is shown below. Below code is not the actual code. The file on google Cloud storage in my version of code was referred to "gs:///Filename.csv" Results - Error message "No such file or directory"

Load data from CSV

import csv
from gcloud import bigquery
from gcloud.bigquery import SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create()  # API request

SCHEMA = [
    SchemaField('full_name', 'STRING', mode='required'),
    SchemaField('age', 'INTEGER', mode='required'),
 ]
table = dataset.table('table_name', SCHEMA)
table.create()

with open('csv_file', 'rb') as readable:
    table.upload_from_file(
        readable, source_format='CSV', skip_leading_rows=1)

3rd Approach -

import csv
import urllib

url = 'https://storage.cloud.google.com/<bucket>/train.csv'


response = urllib.urlopen(url)
cr = csv.reader(response)
print cr

for row in cr:
    print row

Results - Above code doesn't result in any error but it displays the XML content of the google page as shown below. I am interested in viewing the data of the train csv file.

['<!DOCTYPE html>']
['<html lang="en">']
['  <head>']
['  <meta charset="utf-8">']
['  <meta content="width=300', ' initial-scale=1" name="viewport">']
['  <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074-   BPEVmcpBxF6Gwf0MSgQXZs">']
['  <title>Sign in - Google Accounts</title>']

Can someone throw some light on what could be possibly wrong here and how do I achieve my goal? Your help is highly appreciated.

Thanks so much for your help!

解决方案

I assume you are using Jupyter notebook running on a machine in Google Cloud Platform (GCP)? If that's the case, you will already have the Google Cloud SDK running on that machine (by default).

With this setup you have 2 easy options to work with Google Cloud Storage (GCS):

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv')
blob.upload_from_string('this is test content!')

Reading from GCS:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = storage.Blob('train.csv', bucket)
content = blob.download_as_string()

这篇关于无法读取上传到谷歌云存储桶中的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆