从网站将.tar.gz文件的内容读取到python 3.x对象中 [英] Read contents of .tar.gz file from website into a python 3.x object

查看:134
本文介绍了从网站将.tar.gz文件的内容读取到python 3.x对象中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手.尝试将.tar.gz文件的内容读取到python中时,我无法弄清楚我在做什么错.我想读取的tarfile托管在以下网址中:

I am new to python. I can't figure out what I am doing wrong when trying to read the contents of .tar.gz file into python. The tarfile I would like to read is hosted at the following web address:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz

此站点上文件的更多信息(只是您可以信任内容) http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=PMC13901

more info on file at this site (just so you can trust contents) http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=PMC13901

目标文件包含日记文章的.pdf和.nxml副本.还有几个图像文件.

The tarfile contains .pdf and .nxml copies of the journal article. And also a couple of image files.

如果我通过复制和粘贴在浏览器中打开文件.我可以保存到PC上的某个位置,并使用以下命令导入tarfile(注意:当我保存到位置时,winzip将文件从.tar.gz更改为简单的.tar):

If I open the file in my browser by copying and pasting. I can save to a location on my PC and import the tarfile fine using the following commands (note: winzip changes the file from .tar.gz to simply .tar when I save to location):

import tarfile
thetarfile = "C:/Users/dfcm/Documents/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar"
tfile = tarfile.open(thetarfile)
tfile

但是,如果我尝试使用类似的命令直接访问文件:

However, if I try to access the file directly using similar commands:

thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz"
bbb = tarfile.open(thetarfile)

这将导致以下错误:

 Traceback (most recent call last):
 File "<pyshell#137>", line 1, in <module>
 bbb = tarfile.open(thetarfile)
 File "C:\Python30\lib\tarfile.py", line 1625, in open
 return func(name, "r", fileobj, **kwargs)
 File "C:\Python30\lib\tarfile.py", line 1687, in gzopen
 fileobj = bltn_open(name, mode + "b")
 File "C:\Python30\lib\io.py", line 278, in __new__
 return open(*args, **kwargs)
 File "C:\Python30\lib\io.py", line 222, in open
 closefd)
 File "C:\Python30\lib\io.py", line 615, in __init__
 _fileio._FileIO.__init__(self, name, mode, closefd)
 IOError: [Errno 22] Invalid     argument: 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar'

有人试图直接从网址中读取.tar.gz文件时,谁能解释我在做什么?提前致谢.克里斯

Can anyone explain what I am doing wrong when trying to read the .tar.gz file directly from the web address? Thanks in advance. Chris

推荐答案

不幸的是,您不能仅从网络中打开文件.这里的事情要复杂一些.您必须指示解释器创建网络请求并创建代表请求状态的对象.可以使用urllib模块来完成.

Unfortunately you cannot just open files from the network. Things are a bit more complex here. You have to instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module.

import urllib.request
import tarfile
thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz"
ftpstream = urllib.request.urlopen(thetarfile)
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")

ftpstream对象是类似于文件的文件,代表与ftp服务器的连接.然后tarfile模块可以访问该流.由于未传递文件名,因此必须在mode参数中指定压缩.

The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter.

这篇关于从网站将.tar.gz文件的内容读取到python 3.x对象中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆