使用线程收集数据 [英] collect data using threads

查看:80
本文介绍了使用线程收集数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个类收集器,它产生几个线程来从串口读取。

Collector.get_data()将获得自上次

调用以来他们已读取的所有数据。谁能告诉我我的实现是否正确?


类收藏家(对象):

def __init __(个体经营):

self .data = []

spawn_work_bees(callback = self.on_received)

def on_received(self,a_piece_of_data):

" ;""此回调在工作蜜蜂线程中执行!""

self.data.append(a_piece_of_data)

def get_data (个体经营):

x = self.data

self.data = []

返回x


我不太确定get_data()方法。它是否会导致数据丢失

如果有线程同时将数据附加到self.data?


是否有更多的pythonic /标准配方收集线程数据?


-

强强红


_______________________________________________

<那些能做的人;那些不能模拟的人。 >

---------------------------------------- -------

\ ___ ------- ___

\ _- ~~~~ -_

\ _-~ /〜-

/ ^ \ __ / ^ \ /〜\ / \

/ | Ø|| Ø| / \ _______________ / \

| | ___ || __ | / / \\ \\ \\

| \\ / / \\ \\ \\ \\

| (_______)/ ______ / \ _________ \

| / / \\ / / \

\ \ ^ \\ \ / \ /

\ || \ ______________ / _-_ // \ __ //

\ || ------_- ~~ -_ ------------ - \ - / ~~ \ || __ /

~ ----- || ==== /〜| ================== | | / ~~~~~

(_(__ / ./ / \\\\\。

(_(___ / \ _____)_)

A class Collector, it spawns several threads to read from serial port.
Collector.get_data() will get all the data they have read since last
call. Who can tell me whether my implementation correct?

class Collector(object):
def __init__(self):
self.data = []
spawn_work_bees(callback=self.on_received)

def on_received(self, a_piece_of_data):
"""This callback is executed in work bee threads!"""
self.data.append(a_piece_of_data)

def get_data(self):
x = self.data
self.data = []
return x

I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?

Is there a more pythonic/standard recipe to collect thread data?

--
Qiangning Hong

_______________________________________________
< Those who can, do; those who can''t, simulate. >
-----------------------------------------------
\ ___-------___
\ _-~~ ~~-_
\ _-~ /~-_
/^\__/^\ /~ \ / \
/| O|| O| / \_______________/ \
| |___||__| / / \ \
| \ / / \ \
| (_______) /______/ \_________ \
| / / \ / \
\ \^\\ \ / \ /
\ || \______________/ _-_ //\__//
\ ||------_-~~-_ ------------- \ --/~ ~\ || __/
~-----||====/~ |==================| |/~~~~~
(_(__/ ./ / \_\ \.
(_(___/ \_____)_)

推荐答案

Qiangning Hong写道:
Qiangning Hong wrote:
一个类收集器,它产生几个线程从串口读取。
Collector.get_data()将获取自上次调用以来他们已读取的所有数据。谁能告诉我我的实现是否正确?
[带有列表的剪切示例]我不太确定get_data ()方法。如果有线程同时将数据附加到self.data,会导致数据丢失吗?
A class Collector, it spawns several threads to read from serial port.
Collector.get_data() will get all the data they have read since last
call. Who can tell me whether my implementation correct? [snip sample with a list] I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?




那不行正如Jeremy指出的那样,你会得到数据丢失。


通常Python列表是安全的,但你的关键问题(在这段代码中)是

你将self.data重新绑定到一个新列表!如果另一个线程在执行x = self.data行之后调用

on_received(),那么新的

数据永远不会被看到。


一个可以安全运行的选项**是将get_data()更改为

这个:


def get_data(个体经营):

count = len(self.data)

result = self .data [:count]

del self.data [count:]

返回结果


这就是你的尝试要做,但安全。并不是说它没有b $ b重新分配self.data,而是使用单个操作(del)删除

所有保留的数据。元素一下子。有可能在

第一行或第二行之后调用on_received()会添加数据,但是

根本不会被看到直到下一行调用get_data(),而不是

丢失。


**我告诉你这是为了帮助你理解为什么你自己的方法是

错了,不给你应该使用的代码。

的关键问题甚至我的方法是它*假定有关实现的事情*。

具体来说,Python语言中没有任何保证(相反

到CPython,实现)关于使用

列表的线程安全性。事实上,在Jython(以及可能的其他Python

实现)中,这肯定会有问题。除非你确定你的代码只能在CPython下运行,并且你愿意在代码中输入关于潜在线程安全问题的

注释,你应该

可能只是按照Jeremy的建议并使用Queue。作为附带好处,

队列更容易使用!


-Peter



That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is
that you are rebinding self.data to a new list! If another thread calls
on_received() just after the line "x = self.data" executes, then the new
data will never be seen.

One option that would work safely** is to change get_data() to look like
this:

def get_data(self):
count = len(self.data)
result = self.data[:count]
del self.data[count:]
return result

This does what yours was trying to do, but safely. Not that it doesn''t
reassign self.data, but rather uses a single operation (del) to remove
all the "preserved" elements at once. It''s possible that after the
first or second line a call to on_received() will add data, but it
simply won''t be seen until the next call to get_data(), rather than
being lost.

** I''m showing you this to help you understand why your own approach was
wrong, not to give you code that you should use. The key problem with
even my approach is that it *assumes things about the implementation*.
Specifically, there are no guarantees in Python the Language (as opposed
to CPython, the implementation) about the thread-safety of working with
lists like this. In fact, in Jython (and possibly other Python
implementations) this would definitely have problems. Unless you are
certain your code will run only under CPython, and you''re willing to put
comments in the code about potential thread safety issues, you should
probably just follow Jeremy''s advice and use Queue. As a side benefit,
Queues are much easier to work with!

-Peter


Peter Hansen写道:
Peter Hansen wrote:
Qiangning Hong写道:
Qiangning Hong wrote:
一个类收集器,它产生几个线程从串口读取。
Collector.get_data( )将获得他们自上次打电话以来所读取的所有数据。谁能告诉我我的实现是否正确?
A class Collector, it spawns several threads to read from serial port.
Collector.get_data() will get all the data they have read since last
call. Who can tell me whether my implementation correct?



[带有列表的剪切样本]



[snip sample with a list]

我对get_data()不是很确定方法。如果有一个线程同时将数据附加到self.data,它会导致数据丢失吗?
I am not very sure about the get_data() method. Will it cause data lose
if there is a thread is appending data to self.data at the same time?



那将无效,您将获得数据丢失,正如Jeremy指出的那样。

通常Python列表是安全的,但是您的关键问题(在此代码中)是您将self.data重新绑定到新列表!如果另一个线程在行x = self.data之后调用了on_received()。执行,然后新的
数据永远不会被看到。


That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is
that you are rebinding self.data to a new list! If another thread calls
on_received() just after the line "x = self.data" executes, then the new
data will never be seen.




你能解释为什么不呢? self.data仍然绑定到与x相同的列表。至少如果执行顺序是

x = self.data

self.data.append(a_piece_of_data)

self.data = []


ISTM它应该可以工作。


我不是在争论原始代码,我只是想了解你的特定的失败模式。


谢谢,

肯特



Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is
x = self.data
self.data.append(a_piece_of_data)
self.data = []

ISTM it should work.

I''m not arguing in favor of the original code, I''m just trying to understand your specific failure mode.

Thanks,
Kent


此前,6月14日,Jeremy Jones说:


#Kent Johnson写道:



#> Peter Hansen写道:

#>

#> > Qiangning Hong写道:

#> >

#> >

#> > >一个类收集器,它产生几个线程来从串口读取。

#> > > Collector.get_data()将获取自上次以来所读取的所有数据

#> > >呼叫。谁能告诉我我的实施是否正确?

#> > >

#> > [带有列表的剪辑样本]

#> >

#> >

#> > >我不太确定get_data()方法。它会导致数据丢失

#> > >如果有一个线程同时将数据附加到self.data?

#> > >

#> >这是不行的,你会得到数据丢失,正如杰里米指出的那样。

#> >

#> >通常Python列表是安全的,但您的关键问题(在此代码中)是

#> >您正在将self.data重新绑定到新列表!如果另一个线程调用

#> > on_received()紧跟在行x = self.data之后执行,然后新的

#> >数据永远不会被看到。

#> >

#>

#>你能解释一下为什么不呢? self.data仍然绑定到与x相同的列表。在

#>至少如果执行顺序是x = self.data

#> self.data.append(a_piece_of_data)

#> self.data = []

#>

#> ISTM它应该工作。

#>

#>我不是在争论原始代码,我只是想了解

#>你的具体失败模式。

#>

#>谢谢,

#>肯特

#>

#这里是原始代码:



#class Collector(对象):

#def __init __(自我):

#self.data = []

#spawn_work_bees(callback = self.on_received)



#def on_received(self,a_piece_of_data):

#"""此回调在工作蜜蜂线程中执行! """

#self.data.append(a_piece_of_data)



#def get_data(self):

#x = self.data

#self.data = []

#return x



#我看的越多,我就越不确定是否会发生数据丢失。

#对我来说,这就是重写这段代码的理由。我宁愿清楚

#并且肯定比任何一天聪明。

#所以,让我们说一个从``get_data()开始的线程T1 ``并使其成为

#远与``x = self.data``。然后另一个线程T2出现在

#``on_received()``并且得到``self.data.append(a_piece_of_data)``。

# T1中的``x``的get_data()``(正如你所指出的那样)仍指向列表

#,T2只是附加到,T1将返回该列表。但是如果

#你会在``get_data()``中找到多个人,并且在
#``on_received()``中有多个人?我无法证明这一点,但似乎你将会有一个不确定的结果。如果你刚刚处理2个线程,我就看不出那些不安全的b / b#b#。也许有人可能会提出一个用例来反对这个问题。但是,如果你有4个线程,每个方法2个......那就是'b $ b#会变得混乱。

#并且老实说,我正在努力*真的*很难想出一个场景,这将导致
#丢失数据,我不能。也许像彼得或Aahz这样的人,或者在Topeka的一些年纪小的人,比我更聪明,可以想出一些东西。但我确实这么做了...知道这一点 - 我想的更多关于这是否不安全是

#让我头疼。如果你有一段代码你必须花费那么多的时间来试图弄清楚它是否是线程安全的,你为什么要这样做?b $ b#留下它原样?也许你们其他人对你的思考和编程技能比我更有信心,但我会很快在那里打一个队列。如果

#只是为了在我的脑海中模拟休息1,2,3,5,10

#在``get_data()``方法中的线程而各种线程都在

#``on_received()``方法中。 Aaaagghhh .....需要.... motrin ......





#Jeremy Jones



我可能在这里错了,但是你不应该只是使用堆栈,或者用其他的

字,使用列表作为堆栈,只是从顶部弹出数据。我

相信已经为你提供了一个方法pop()。由于

你不需要self.data = []这可以让你安全地

删除你已经看过的数据而不会意外删除数据<可能已经平均加入的



---

James Tanis
jt **** @ pycoder.org
http://pycoder.org
Previously, on Jun 14, Jeremy Jones said:

# Kent Johnson wrote:
#
# > Peter Hansen wrote:
# >
# > > Qiangning Hong wrote:
# > >
# > >
# > > > A class Collector, it spawns several threads to read from serial port.
# > > > Collector.get_data() will get all the data they have read since last
# > > > call. Who can tell me whether my implementation correct?
# > > >
# > > [snip sample with a list]
# > >
# > >
# > > > I am not very sure about the get_data() method. Will it cause data lose
# > > > if there is a thread is appending data to self.data at the same time?
# > > >
# > > That will not work, and you will get data loss, as Jeremy points out.
# > >
# > > Normally Python lists are safe, but your key problem (in this code) is
# > > that you are rebinding self.data to a new list! If another thread calls
# > > on_received() just after the line "x = self.data" executes, then the new
# > > data will never be seen.
# > >
# >
# > Can you explain why not? self.data is still bound to the same list as x. At
# > least if the execution sequence is x = self.data
# > self.data.append(a_piece_of_data)
# > self.data = []
# >
# > ISTM it should work.
# >
# > I''m not arguing in favor of the original code, I''m just trying to understand
# > your specific failure mode.
# >
# > Thanks,
# > Kent
# >
# Here''s the original code:
#
# class Collector(object):
# def __init__(self):
# self.data = []
# spawn_work_bees(callback=self.on_received)
#
# def on_received(self, a_piece_of_data):
# """This callback is executed in work bee threads!"""
# self.data.append(a_piece_of_data)
#
# def get_data(self):
# x = self.data
# self.data = []
# return x
#
# The more I look at this, the more I''m not sure whether data loss will occur.
# For me, that''s good enough reason to rewrite this code. I''d rather be clear
# and certain than clever anyday.
# So, let''s say you a thread T1 which starts in ``get_data()`` and makes it as
# far as ``x = self.data``. Then another thread T2 comes along in
# ``on_received()`` and gets as far as ``self.data.append(a_piece_of_data)``.
# ``x`` in T1''s get_data()`` (as you pointed out) is still pointing to the list
# that T2 just appended to and T1 will return that list. But what happens if
# you get multiple guys in ``get_data()`` and multiple guys in
# ``on_received()``? I can''t prove it, but it seems like you''re going to have
# an uncertain outcome. If you''re just dealing with 2 threads, I can''t see how
# that would be unsafe. Maybe someone could come up with a use case that would
# disprove that. But if you''ve got, say, 4 threads, 2 in each method....that''s
# gonna get messy.
# And, honestly, I''m trying *really* hard to come up with a scenario that would
# lose data and I can''t. Maybe someone like Peter or Aahz or some little 13
# year old in Topeka who''s smarter than me can come up with something. But I do
# know this - the more I think about this as to whether this is unsafe or not is
# making my head hurt. If you have a piece of code that you have to spend that
# much time on trying to figure out if it is threadsafe or not, why would you
# leave it as is? Maybe the rest of you are more confident in your thinking and
# programming skills than I am, but I would quickly slap a Queue in there. If
# for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10
# threads in the ``get_data()`` method while various threads are in the
# ``on_received()`` method. Aaaagghhh.....need....motrin......
#
#
# Jeremy Jones
#

I may be wrong here, but shouldn''t you just use a stack, or in other
words, use the list as a stack and just pop the data off the top. I
believe there is a method pop() already supplied for you. Since
you wouldn''t require an self.data = [] this should allow you to safely
remove the data you''ve already seen without accidentally removing data
that may have been added in the mean time.

---
James Tanis
jt****@pycoder.org
http://pycoder.org


这篇关于使用线程收集数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆