多个线程在Python中写入同一个CSV [英] Multiple threads writing to the same CSV in Python
问题描述
我是Python中的多线程的新手,目前正在编写一个附加到csv文件的脚本。如果我有多个线程提交到 concurrent.futures.ThreadPoolExecutor
,将行附加到csv文件。
我的代码的简化版本:
with concurrent.futures.ThreadPoolExecutor(max_workers = 3)as executor:
for count,ad_id in enumerate(advertisers):
downloadFutures.append(executor.submit(downloadThread,arguments .....))
time.sleep(random.randint(1,3))
我的线程类是:
def downloadThread (arguments ......):
#Some code .....
writer.writerow(re.split(',',line.decode )))
我应该设置一个单独的单线程执行器来处理写入还是令人担忧如果我只是追加?
编辑:我应该详细说明,当写操作发生时,文件将被附加到之间的几分钟之间变化很大,我只是
解决方案我不确定如果
csvwriter
是线程安全的。 文档未指定,为了安全起见,如果多线程使用相同的对象,你应该用threading.Lock
来保护用法:code>#创建锁
import threading
csv_writer_lock = threading.Lock()
def downloadThread(arguments ......):
#pass csv_writer_lock somehow
#注意:使用csv_writer_lock *任何*访问
#一些代码.....
with csv_writer_lock:
writer.writerow(re.split(' ,',line.decode()))
downloadThread
向执行者提交写入任务,而不是显式地使用这样的锁。I'm new to multi-threading in Python and am currently writing a script that appends to a csv file. If I was to have multiple threads submitted to an
concurrent.futures.ThreadPoolExecutor
that appends lines to a csv file. What could I do to guarantee thread safety if appending was the only file-related operation being done by these threads?Simplified version of my code:
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor: for count,ad_id in enumerate(advertisers): downloadFutures.append(executor.submit(downloadThread, arguments.....)) time.sleep(random.randint(1,3))
And my thread class being:
def downloadThread(arguments......): #Some code..... writer.writerow(re.split(',', line.decode()))
Should I set up a seperate single-threaded executor to handle writing or is it woth worrying about if I am just appending?
EDIT: I should elaborate that when the write operations occur can vary greatly with minutes between when the file is next appended to, I am just concerned that this scenario has not occurred when testing my script and I would prefer to be covered for that.
解决方案I am not sure if
csvwriter
is thread-safe. The documentation doesn't specify, so to be safe, if multiple threads use the same object, you should protect the usage with athreading.Lock
:# create the lock import threading csv_writer_lock = threading.Lock() def downloadThread(arguments......): # pass csv_writer_lock somehow # Note: use csv_writer_lock on *any* access # Some code..... with csv_writer_lock: writer.writerow(re.split(',', line.decode()))
That being said, it may indeed be more elegant for the
downloadThread
to submit write tasks to an executor, instead of explicitly using locks like this.这篇关于多个线程在Python中写入同一个CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!