Python从url逐行下载大型csv文件仅10个条目 [英] Python download large csv file from a url line by line for only 10 entries
问题描述
我有一个很大的客户端csv文件,并通过url共享进行下载,我想按行或按字节下载它,并且我希望限制为10个条目.
I have a large csv file of the client and shared via a url to download and I want to download it line by line or by bytes and I want to limit only for 10 entries.
我有以下代码将下载文件,但是我想在这里仅下载文件中的前10个条目,我不希望完整的文件.
I have the following code which will download the file, but i want here to download only the first 10 entries from the file, I don't want the full file.
#!/usr/bin/env python
import requests
from contextlib import closing
import csv
url = "https://example.com.au/catalog/food-catalog.csv"
with closing(requests.get(url, stream=True)) as r:
f = (line.decode('utf-8') for line in r.iter_lines())
reader = csv.reader(f, delimiter=',', quotechar='"')
for row in reader:
print(row)
我对contextlib
不太了解,它在Python中如何与with
一起使用.
I don't know much about contextlib
, how it will work with with
in Python.
任何人都可以在这里帮助我,这将非常有帮助,在此先感谢.
Can anyone help me here, it would be really helpful, and thanks in advance.
推荐答案
对于contextlib
来说,问题不像发电机那样.当您的with
块结束时,连接将非常简单地关闭.
The issue is not so much with contextlib
as with generators. When your with
block ends, the connection will be closed, fairly straightforwardly.
实际下载的部分是for row in reader:
,因为reader
包裹在f
(这是惰性生成器)周围.循环的每次迭代实际上都会从流中读取一行,可能需要使用Python进行一些内部缓冲.
The part that actually does the download is for row in reader:
, since reader
is wrapped around f
, which is a lazy generator. Each iteration of the loop will actually read a line from the stream, possibly with some internal buffering by Python.
然后的关键是在10行之后停止循环.有一些简单的方法可以做到这一点:
The key then is to stop the loop after 10 lines. There area couple of simple ways of doing that:
for count, row in enumerate(reader, start=1):
print(row)
if count == 10:
break
或
from itertools import islice
...
for row in islice(reader, 0, 10):
print(row)
这篇关于Python从url逐行下载大型csv文件仅10个条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!