无法读取上传到谷歌云存储桶中的csv文件 [英] Unable to read csv file uploaded on google cloud storage bucket
问题描述
目标 - 读取上传到谷歌云存储桶中的csv文件。
$ b 环境 - 运行Jupyter笔记本在主节点上使用SSH实例。使用Jupyter笔记本上的python尝试访问上传到谷歌云存储桶中的简单csv文件。
方法 -
第一种方法 - 编写一个简单的Python程序
编写以下程序
import csv
f = open('gs://python_test_hm/train.csv','rb')
csv_f = csv.reader(f)
for csv_f
打印行
结果 - 错误信息No such file or directory
第二种方法 - 使用gcloud包试图访问train.csv文件。示例代码如下所示。下面的代码不是实际的代码。在我的代码版本的谷歌云存储文件被称为gs:///Filename.csv
结果 - 错误消息没有这样的文件或目录
从CSV加载数据
从gcloud导入csv
从gcloud.bigquery导入bigquery
导入SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create()#API请求
SCHEMA = [
SchemaField('full_name','STRING',mode ='required'),
SchemaField('age','INTEGER',mode ='required'),
]
table = dataset.table('table_name',SCHEMA)
table.create()
$ b $打开('csv_file','rb')为可读:
表。 upload_from_file(
可读,source_format ='CSV',skip_leading_rows = 1)
import csv
导入urllib
url ='https://storage.cloud .google.com /<斗&克t; /train.csv'
response = urllib.urlopen(url)
cr = csv.reader(响应)
print cr
for row in cr:
print row
结果 - 上面的代码没有导致任何错误,但它显示如下所示的谷歌页面的XML内容。
['<!DOCTYPE html>'] $ b $我有兴趣查看火车csv文件的数据。 b ['< html lang =en>']
['< head>]
['< meta charset =utf-8>']
['meta content =width = 300','initial-scale = 1name =viewport>']
['meta name =google-site-verificationcontent = LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs>']
['< title>登录 - Google帐户< / title>']
有人可以对此处可能出现的错误有所了解,我如何实现目标?
非常感谢您的帮助!
我假设您正在使用在Google云端平台(GCP)中的计算机上运行的Jupyter笔记本电脑?
如果是这种情况,那么您将已经在该机器上运行Google Cloud SDK(默认情况下)。
使用此设置,您有两个简单的选项使用Google Cloud Storage(GCS):使用 b 在Jupyter中使用/ storage / docs / gsutilrel =nofollow noreferrer> gcloud / gsutil命令
$ b
$ b 写入GCS: gsutil cp train.csv gs://python_test_hm/train.csv
从GCS读取:
gsutil cp gs://python_test_hm/train.csv train.csv
写入GCS:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv ')
blob.upload_fro m_string('this is test content!')
从GCS读取:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket ('python_test_hm')
blob = storage.Blob('train.csv',bucket)
content = blob.download_as_string()
Goal - To read csv file uploaded on google cloud storage bucket.
Environment - Run Jupyter notebook using SSH instance on Master node. Using python on Jupyter notebook trying to access a simple csv file uploaded onto google cloud storage bucket.
Approaches -
1st approach - Write a simple python program
Wrote following program
import csv
f = open('gs://python_test_hm/train.csv' , 'rb' )
csv_f = csv.reader(f)
for row in csv_f
print row
Results - Error message "No such file or directory"
2nd Approach - Using gcloud Package tried to access the train.csv file. The sample code is shown below. Below code is not the actual code. The file on google Cloud storage in my version of code was referred to "gs:///Filename.csv" Results - Error message "No such file or directory"
Load data from CSV
import csv
from gcloud import bigquery
from gcloud.bigquery import SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create() # API request
SCHEMA = [
SchemaField('full_name', 'STRING', mode='required'),
SchemaField('age', 'INTEGER', mode='required'),
]
table = dataset.table('table_name', SCHEMA)
table.create()
with open('csv_file', 'rb') as readable:
table.upload_from_file(
readable, source_format='CSV', skip_leading_rows=1)
3rd Approach -
import csv
import urllib
url = 'https://storage.cloud.google.com/<bucket>/train.csv'
response = urllib.urlopen(url)
cr = csv.reader(response)
print cr
for row in cr:
print row
Results - Above code doesn't result in any error but it displays the XML content of the google page as shown below. I am interested in viewing the data of the train csv file.
['<!DOCTYPE html>']
['<html lang="en">']
[' <head>']
[' <meta charset="utf-8">']
[' <meta content="width=300', ' initial-scale=1" name="viewport">']
[' <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074- BPEVmcpBxF6Gwf0MSgQXZs">']
[' <title>Sign in - Google Accounts</title>']
Can someone throw some light on what could be possibly wrong here and how do I achieve my goal? Your help is highly appreciated.
Thanks so much for your help!
I assume you are using Jupyter notebook running on a machine in Google Cloud Platform (GCP)? If that's the case, you will already have the Google Cloud SDK running on that machine (by default).
With this setup you have 2 easy options to work with Google Cloud Storage (GCS):
Use the gcloud/gsutil commands in Jupyter
Writing to GCS:
gsutil cp train.csv gs://python_test_hm/train.csv
Reading from GCS:
gsutil cp gs://python_test_hm/train.csv train.csv
Use google-cloud python library
Writing to GCS:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv')
blob.upload_from_string('this is test content!')
Reading from GCS:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = storage.Blob('train.csv', bucket)
content = blob.download_as_string()
这篇关于无法读取上传到谷歌云存储桶中的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!