Clojure或Scala用于生物信息学/生物统计学/医学研究 [英] Clojure or Scala for bioinformatics/biostatistics/medical research

查看:172
本文介绍了Clojure或Scala用于生物信息学/生物统计学/医学研究的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是一个专业的程序员(我的领域是医学研究),但我有能力在C / C ++和各种脚本语言。一回来,我对Lisp很感兴趣,但我从来没有时间认真地学习它。在对 R 进行简短曝光后,我决定在功能编程语言中投入更多时间。



我想要一个JVM语言的实用性,从而缩小到Clojure和Scala。从我的理解,两者都可以使用已经存在的Java库,并且在性能关键代码给予可以委托给Java,有潜力执行相对同样好。



如何做这些语言比较在我需要他们的应用程序空间?
在生物信息学中有任何现实生活项目吗?



已经存在的代码将是一个严重的加分,因为会有良好的文档和一个相当温和的学习曲线。另外,两者的并发模型如何相互比较?



任何一个有明显的优点/缺点?

解决方案

我可以亲自认为Clojure是一个伟大的工具,这种工作。 (我相信Scala会很棒,我只是有较少的经验)。



我个人的研究是在预测建模/机器学习领域,因此我认为它与生物信息学或生物统计学有很多相似之处。



我的个人方法/设置包括:




  • Incanter 主要用作数据可视化工具。伟大的生产快速的可视化,通常只是在REPL的1线。还有很多统计和数值处理工具,我相信在引擎盖下使用 Colt 库。我不是R的专家,但我知道Incanter大致R翻译成Clojure / Lisp。


  • 根据需要使用了很多Java库。其中有些是我自己的,例如我用Java编写的算法,以便从JVM中获得最佳的微调性能。但是你同样可以很容易地使用任何其他伟大的Java库,因为从Clojure调用Java非常简单(.methodName对象param1 param2)


  • 很多高阶功能自动化我的工作流程。例如,我有一个高阶函数,它将在一个循环中运行任何类型的优化算法一段指定的时间,然后在每次迭代中产生改进的Incanter图。不是火箭科学,但很容易在Clojure的几行代码。


  • 从来没有真正地担心性能。你可以使Clojure非常快,如果你想(例如,类型提示,原始算术支持等),但通常它是不相关的,因为你要花费99%的周期在良好优化的库代码反正。因此,glue代码中的一点开销可以忽略不计 - 我觉得通过使用动态的高级功能语言来获得更多的个人生产力。


  • 主要使用Clojure的并发功能 - 这必须是Clojure最强大的功能之一。我倾向于使用STM来对并发进程进行编码,这些事务不能相互干扰,然后在将来启动长时间运行的计算,以便我可以继续其他任务,并等待结果的通知。 p>


  • 缓慢增长的宏集合,可在需要时扩展语言。我实际上使用宏少于我认为我会(高阶函数通常是一个更好的选择)。但是当您需要它们时,它们是非常宝贵的 - 这是您真正感谢同语言价值的地方。由于它们有效地允许您向语言本身添加新的语法,因此在正确使用时可以非常强大地构建您需要的DSL。




简而言之 - 我不认为你可以作为研究员出错Clojure。



我可能不会使用它(现在)实际上正在写一个新的数值库 - 这可能会更好地在Scala或纯Java,因为你可能想采用一个更强制/ OOP风格。


I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I decided to invest more time in a functional programming language.

I would like the practicality of a JVM language and thus narrowed to Clojure and Scala. From what I understand, both can use already existing Java libraries and given at performance-critical code can be delegated to Java, have the potential to perform relatively equally well.

How do these languages compare in the application space I need them for? Are There any real-life projects in bioinformatics using either?

Already existing code would be a serious plus, as would be good documentation and a fairly gentle learning curve. Also, how does the concurrency model of the two compare with each other?

Any significant advantages/disadvantages any one has?

解决方案

I can personally vouch for Clojure as a great tool for this kind of work. (I believe Scala would be great too, I just have less experience with it).

My personal research is in the field of predictive modelling / machine learning and is very computationally intensive - so I think it has many parallels with bioinformatics or biostatistics.

My personal approach / setup includes:

  • Incanter used primarily as a data visualisation tool. Great for producing quick visualisations which are usually just 1-liners at the REPL. There are also lots of statistical and numerical processing tools which I believe use the Colt library under the hood. I'm not an expert in R but I understand that Incanter is roughly "R translated to Clojure/Lisp".

  • Exploiting quite a few Java libraries as needed. Some of these are my own, for example algorithms that I have written in Java in order to get the best possible fine-tuned performance out of the JVM. But you could equally easily use any of the other great Java libraries available, as calling Java from Clojure is very simple (.methodName object param1 param2)

  • Quite a lot of higher order functions to automate my workflow. For example I have a higher order function that will run an optimisation algorithm of any kind in a loop for a specified amount of time and then produce an Incanter graph of the improvement on each iteration. Not rocket science, but really easy to code up in a few lines of Clojure.

  • Never really having to worry about performance. You can make Clojure go pretty fast if you want to (e.g. with type hints, primitive arithmetic support etc.) but normally it's irrelevant as you're going to spend 99%+ of your cycles in well-optimised library code anyway. Hence a bit of overhead in the "glue" code is negligible - I feel I gain much more in terms of personal productivity by having a dynamic, high-level, functional language to work in.

  • Major use of Clojure's concurrency features - this has to be one of Clojure's strongest features. I tend to use the STM to code concurrent processes with transactions that can't interfere with each other, then kick off long-running calculations in a future so that I can get on with other tasks and wait for notification of the result.

  • A slowly growing collection of macros to "extend the language" when needed. I actually use macros less than I thought I would (higher order functions are often a better choice). But when you need them they are invaluable - this is where you really appreciate the value of a homoiconic language. Since they effectively allow you to add new syntax to the language itself, they are very powerful when used correctly to build the DSL that you need.

In short - I don't think you can go wrong with Clojure as a researcher.

The one thing I probably wouldn't use it for (yet) is actually writing a new numerical library - this would probably be better done in Scala or pure Java as you would probably want to adopt a more imperative / OOP style.

这篇关于Clojure或Scala用于生物信息学/生物统计学/医学研究的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆