如何读取通过追加行连续更新的文件? [英] How to read a file that is continuously being updated by appending lines?
问题描述
在我的终端中,我正在运行:
In my terminal I am running:
curl --user dhelm:12345 \https://stream.twitter.com/1.1/statuses/sample.json > raw-data.txt
curl的输出是直播流数据,正在写入文件raw -data.txt
curl's output is live streaming Twitter data which is being written on to a file raw-data.txt
在python中,
import json
posts = []
for line in open("/Users/me/raw-data.txt"):
try:
posts.append(json.loads(line))
except:
pass
python和使用json解码器,并将结果附加到帖子。
I am reading the file in python and using json decoder and appending the results to posts.
现在,问题是我不希望我的程序结束,当python脚本到达文件结尾。
Now, the issue is I don't want my program to end when the python script reaches the end of file. instead I want to continue reading when the curl running on my terminal appends more posts to the file raw-data.txt.
推荐答案
我想要继续阅读,当我的终端上运行的curl会添加更多的帖子到文件raw-data.txt。我认为这是一个 XY问题。因为你不能想象一种从Python内逐行流式传输HTTP请求的方式,所以你决定使用 curl
做一个流式下载到一个文件,然后从Python中读取该文件。因为你这样做,你必须处理在请求仍然进行时运行到EOF的可能性,只是因为你已经赶上 curl
。
I think this is an XY problem. Because you couldn't think of a way to stream an HTTP request line by line from within Python, you decided to use curl
to do a streaming download to a file, and then read that file from within Python. Because you did that, you have to deal with the possibility of running into EOF while the request is still going, just because you've caught up to curl
. So you're making things harder on yourself for no reason.
虽然使用stdlib可以进行下载,但是有点痛苦; 请求
库使它更容易。所以,让我们使用:
While streaming downloads can be done with the stdlib, it's a bit painful; the requests
library makes it a lot easier. So, let's use that:
import json
import requests
from requests.auth import HTTPBasicAuth
posts = []
url = 'https://stream.twitter.com/1.1/statuses/sample.json'
r = requests.get(url, auth=('dhelm', '12345'), stream=True)
for line in r.iter_lines():
try:
posts.append(json.loads(line))
except:
pass
这是整个程序。
这篇关于如何读取通过追加行连续更新的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!