Julia 语言 - @async 任务中的状态 :: Current-Directory [英] Julia language - State in @async tasks :: Current-Directory

查看:19
本文介绍了Julia 语言 - @async 任务中的状态 :: Current-Directory的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到(阅读:捕获生产错误)Julia 中的不同任务 - 没有自己的工作目录,但当前目录 - 是共享的.我意识到在操作系统级别这很明显(一个进程有一个工作目录).

I've noticed (read: caught a production bug) different Tasks in Julia - do not have their own working directory, but that the current directory - is shared. I realise in an OS level this is kind of obvious (a process has a working-directory).

我的问题是首先 - 是否有任何其他明显或不太明显的全局状态我应该注意(显然是环境变量或任何全局变量).

My question is first - is there any other obvious or less obvious global state I should watch out for (obviously environment variables, or any global variables).

第二 - 如果任务抽象更多地记录或避免这种情况 - 抽象中的任务",它可以(理论上)拥有自己的语义,比如移回工作目录.

Second - should this be more documented, or avoided by the task abstraction, - a "Task" in an abstraction, it could (theoretically) have it's own semantics, like moving back to a working directory.

我们已经通过从代码中删除任何cd()"调用解决了产品错误,关键是 - 带有闭包抽象的 cd() 给了我们一种可以安全使用的错觉.

We've solved the product bug by removing any 'cd()' call from within the code, the point is - the cd() with closure abstraction was giving us the illusion that this might be safe to use.

即:

cd("some_dir") do
  # stuff
end

我们已经在 Mux 端点中运行了此类代码.

We've had this sort of code working in Mux endpoints.

我对这个问题的最小再现是

My minimal reproduction of the issue, is

function runme(path)
    mkpath(path)
    abs_path = realpath(path)
    return t = @async begin
        cd(abs_path) do
            sleep(1)
            println(path,"::",(pwd()|>splitdir)[2])
        end
    end
end

runme("a")
runme("b")

输出:(显然)

a::b
b::b

<小时>

(摘要)-尽管这几乎不是问题-这应该是可搜索和记录的(因为它可能是同步错误的来源).


(summary) - though this is almost not a question - this should be searchable and documented (as it's a possible source of synchronisation bugs).

与全局变量的区别(关于cd()"的状态)- 可以使用 let 语句在闭包中捕获变量,而当前目录不能.虽然这甚至不是特定于编程语言的(而是一个操作系统进程问题)——我认为语法确实给人一种局部性的错觉(类似于 python 'with' 块或许多其他设备).

The difference to just a global variable (about the state of 'cd()') - a variable can be captured in a closure using a let statement, while the current directory cannot. While this is not even programming language specific (but a OS-process issue) - I think the syntax does give an illusion of locality (similar to python 'with' blocks, or many other devices).

因此,底线是cd"抽象不应在任何生产实用程序中使用,除非有一天有一种方法可以将处理程序设置为切换回"到任务/块/闭包(类似于 finally 以某种方式阻塞)

Thus the bottom line is that the 'cd' abstraction should not be used in any production utility, unless one day there's a way to set a handler for 'switching back' into a Task/block/closure (similar to the finally blocks in a way)

推荐答案

我没有明确了解内部或特定实现,这是我个人的有根据的猜测,很高兴被实际的 julia 开发人员纠正,但我认为这不是任务本身共享当前目录"的情况,而是它们更普遍地共享状态".您的示例将与全局变量的行为方式相同:

I'm not explicitly aware of the internals or particular implementation, and this is my personal educated guess and happy to be corrected by an actual julia dev, but I think it's not a case that Tasks share "the current directory" per se, but that they more generally share "state". Your example will behave the same way with global variables instead:

# in testscript.jl
var = 0;

function runme(val)
    global var = val+1;
    return t = @async begin
      sleep(1)
      println(val,"::",var);
    end
end

runme(1) 
runme(3) 

# in the REPL session
julia> include("testo.jl");
  1::4
  3::4

但是,(全局)状态的共享是一项功能,而不是错误.这与不共享状态的进程(这是 julia 实现真正并行性的方式)形成对比,因此工作人员之间的所有通信都需要通过套接字完成.

However, the sharing of (global) state is a feature, not a bug. This is in contrast to processes (which is the way in which julia achieves true parallelism), which do not share state, and therefore all communications between workers need to be done via sockets.

虽然确实需要小心这一点,但它也可能非常有用和必要.任务(或协程)不用于在这方面实现并行或限制.它们是协作多任务的一种形式",即一种在同一线程上实现多次运行操作的方法;这不是并行性,多个操作在 CPU 的监督下以适当的调度一次一个"运行.例如,try/catch"块(显然)是使用任务实现的.

While one does need to be careful with this, it can also be very useful and necessary. Tasks (or coroutines) are not used to achieve parallelism or confinement in that regard. They are "a form of cooperative multitasking", i.e. a way to achieve multiply running operations on the same thread; this is not parallelism, the multiple operations are run "one at a time with appropriate scheduling, as supervised by the CPU". For instance "try/catch" blocks are (apparently) implemented using Tasks.

所以,要回答您的第一个问题,是的,您需要了解共享状态,而对于第二个问题,不,您正在以某种方式访问​​全局状态的方式使用任务(当前目录是其中的一个方面)我不完全确定每个任务都应该按照您描述的方式具有自己的语义;相反,您只需要以这样一种方式设计您的任务,即它们考虑到状态共享这一事实,并采取相应的行动.

So, to answer your first question, yes, you need to be aware of the shared state, and to the second question, no, to the extent that you're using the Tasks in a way that accesses somehow the global state (of which the current directory is an aspect) I'm not entirely sure each Task should have its own semantics in the way you describe; instead you just need to design your tasks in such a way that they take the fact that state is shared into account, and act accordingly.

作为第二个的进一步示例,请考虑两个单独的任务,它们产生"需要消费"的输出.如果您依赖于基于全局状态的任一任务的适当消耗,那么您的任务完全有可能通过设计针对共享的全局状态进行适当的行为.这是一个简单的例子:

As a further example of the second, consider two separate tasks that "produce" outputs that need to be "consumed". If you rely on the appropriate consumption from either task based on a global state, then it is entirely possible that your task should behave appropriately with respect to the shared global state by design. Here's a trivial example of this:

d = 0;

function report()
  global d;
  for i in 1:4
    if iseven(d); produce("D is Even
"); else; produce("D is Odd
"); end
  end
end

task1 = Task( report );
task2 = Task( report );

for i in 1:4
  d = i;
  consume(task1) |> print;
  consume(task2) |> print;
end

D is Odd
D is Odd
D is Even
D is Even
D is Odd
D is Odd
D is Even
D is Even

<小时>

PS.最新的 julia 版本告诉我,生产"和消费"正在被弃用,有利于频道",但大概是正确的.

这篇关于Julia 语言 - @async 任务中的状态 :: Current-Directory的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆