研究环境中的软件项目和开发 [英] Software projects and development in a research environment

查看:53
本文介绍了研究环境中的软件项目和开发的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您或项目不清楚最终(如果有)产品将是什么时,可以采用哪些有用的策略?

What are useful strategies to adopt when you or the project does not have a clear idea of what the final (if any) product is going to be?

让我们将研究"理解为对一个领域的探索,在该领域中,许多事情尚不清楚或未实施,并且在项目开始时无法指定一组正式的可交付成果.这在 STEM(科学(物理学、化学、生物学、材料等)、技术工程、医学)以及信息学和计算机科学的许多领域中很常见.软件既可以作为目的(例如新算法)、管理数据(通常是实验性的)和模拟(例如材料、反应等)的手段而创建.它通常是由小团体或个人创建的(我省略了望远镜和强子对撞机等大科学,它们非常重视软件工程.)

Let us take "research" to mean an exploration into an area where many things are not known or implemented and where a formal set of deliverables cannot be specified at the start of the project. This is common in STEM (science (physics, chemistry, biology, materials, etc.), technology engineering, medicine) and many areas of informatics and computer science. Software is created either as an end in itself (e.g. a new algorithm), a means of managing data (often experimental) and simulation (e.g. materials, reactions, etc.). It is usually created by small groups or individuals (I omit large science such as telescopes and hadron colliders where much emphasis is put of software engineering.)

研究软件的特点(至少):

Research software is characterised by (at least):

  • 未知结果
  • 未知的时间表
  • 很少有正式的项目管理
  • 预算有限(至少在学术界是如此)
  • 第三方工具和库的不可预测性
  • 项目期间外部世界的变化(例如,新发现可能是积极的 - 省力 - 或消极 - 被抢购

项目可以是几天(看看这是一个值得去的方向")到几年(这是我的博士课题")或更长时间.通常,这些人不是被聘为软件人员,而是发现他们需要编写代码来完成研究,否则会因编写软件而受到感染.优秀的软件工程通常没有什么信誉——产品"是会议或期刊出版物.

Projects can be anything from days ("see if this is a worthwhile direction to go") to years ("this is my PhD topic") or longer. Frequently the people are not hired as software people but find they need to write code to get the research done or get infected by writing software. There is generally little credit for good software engineering - the "product" is a conference or journal publication.

然而,其中一些项目证明是非常有价值的——最明显的领域是基因组学,早期科学家们表明动态规划是一种革命性的工具,可以帮助思考蛋白质和核结构——现在这是一个多亿业(或更多).量子力学代码预测物质性质也是如此.

However some of these projects turn out to be highly valuable - the most obvious area is genomics where in the early days scientists showed that dynamic programming was a revolutionary tool to help thinking about protein and nucleic structure - now this is a multi-billion industry (or more). The same is true for quantum mechanics codes to predict properties of substances.

缺点是很多代码被扔掉并且难以构建.为了克服这个问题,我们建立了在小组中共享的库,并作为开放源在世界范围内共享(但这里再次给出的信用很少).许多研究人员重新发明了轮子(不咨询同事的低头"编程和有人试图自己完成所有工作的英雄"编程).

The downside is that much code gets thrown away and it is difficult to build on. To try to overcome this we have build up libraries which are shared in the group and through the world as Open Source (but here again there is very little credit given). Many researchers reinvent the wheel ("head-down" programming where colleagues are not consulted and "hero" programming where someone tries to do the whole lot themself).

在项目开始时过于拘泥于形式往往会让人们望而却步,失去创新(没有人会花 2 个月的时间编写正式的规范和单元测试).太少和坏习惯被养成和传播.编程课程有帮助,但同样很难让人们去做它们,尤其是当你依赖他们的善意时.指导非常有价值,但并不总是成功.

Too much formality at the start of a project often puts people off and innovation is lost (no-one will spend 2 months writing formal specs and unit tests). Too little and bad habits are developed and promulgated. Programming courses help but again it's difficult to get people doing them especially when you rely on their goodwill. Mentoring is extremely valuable but not always successful.

是否有在线资源可以帮助说服人们养成良好的软件习惯?

Are there online resources which can help to persuade people into good software habits?

我很感谢 dmckee(下面)指出了类似的讨论.这都是好东西,我特别同意版本控制是我们可以为科学家提供的最重要的东西之一(我们向我们的同事提供了这个并且得到了很好的接受).我也喜欢那里提到的软件木工课程的方法.

I'm grateful for dmckee (below) for pointing out a similar discussion. It's all good stuff and I particularly agree with version control as being one of the most important things that we can offer scientists (we offered this to our colleagues and got very good takeup). I also like the approach of the Software Carpentry course mentioned there.

推荐答案

这非常困难.您和 Stefano Borini 描述的环境非常准确.我认为传播这种情况有三个关键因素.

It's extremely difficult. The environment both you and Stefano Borini describe is very accurate. I think there are three key factors which propagate the situation.

  1. 短期思考
  2. 缺乏正规培训和经验
  3. 研究生/博士后的持续更替以应对新发展的冲击

短期思考.短期思考成为常态有几个原因,其中大部分已经被 Stefano 很好地解释过.除了发布的巨大压力和对软件创作缺乏认可之外,我还要强调短期合同的数量.对于更多的初级学者(博士生和博士后)来说,花费任何时间来规划长期软件战略几乎没有什么优势,因为合同是 2-3 年.在长期项目的情况下,例如那些基于常驻人员的模拟代码,我看到了一些基础软件工程的应用,比如简单的版本控制,标准测试用例等.但即使在这些情况下,项目管理也是非常原始的.

Short-term thinking. There are a few reasons that short-term thinking is the norm, most of them already well explained by Stefano. As well as the awful pressure to publish and the lack of recognition for software creation, I would emphasise the number of short-term contracts. There is simply very little advantage for more junior academics (PhD students and postdocs) to spend any time planning long-term software strategies, since contracts are 2-3 years. In the case of longer-term projects e.g. those based around the simulation code of a permanent member of staff, I have seen some applications of basic software engineering, things like simple version control, standard test cases, etc. However even in these cases, project management is extremely primitive.

缺乏正规培训和经验.这是一个严重的障碍.在天文学和天体物理学中,编程是必不可少的工具,但对开发成本(尤其是维护费用)的理解极其贫乏.因为科学家通常都是聪明人,所以有一种感觉,软件工程实践并不真正适用于他们,他们可以让它发挥作用".随着经验的增加,大多数程序员意识到编写最有效的代码并不是最难的部分;有效和安全地维护和扩展它是.一些科学代码是一次性的,在这些情况下,快速而肮脏的方法就足够了.但很多时候,这些代码将在未来几年内被使用和重用,给所有相关人员带来随之而来的悲痛.

Lack of formal training and experience. This is a serious handicap. In astronomy and astrophysics, programming is an essential tool, but understanding of the costs of development, particularly maintenance overheads, is extremely poor. Because scientists are normally smart people, there is a feeling that software engineering practices don't really apply to them, and that they can 'just make it work'. With more experience, most programmers realise that writing code that mostly works isn't the hard part; maintaining and extending it efficiently and safely is. Some scientific code is throwaway, and in these cases the quick and dirty approach is adequate. But all too often, the code will be used and reused for years to come, bringing consequent grief to all involved with it.

为了新的开发而不断更换研究生/博士后.我认为这是让软件的学术方法继续生存的关键特征.如果代码很糟糕并且需要数天时间来理解和调试,那么谁会为此付出代价?一般来说,它不是原作者(他可能已经离开了).也不是常任工作人员,他们通常只在外围参与新的开发.通常是研究生在实施新算法,产生新方法,试图以某种方式扩展代码.有时,它会是一名博士后,专门受雇为现有代码添加一些功能,并根据合同有义务在这方面工作一段时间.

Continuous turnover of grad students/postdocs for new development. I think this is the key feature that allows the academic approach to software to continue to survive. If the code is horrendous and takes days to understand and debug, who pays that price? In general, it's not the original author (who has probably moved on). Nor is it the permanent member of staff, who is often only peripherally involved with new development. It is normally the graduate student who is implementing new algorithms, producing novel approaches, trying to extend the code in some way. Sometimes it will be a postdoc, hired specifically to work on adding some feature to an existing code, and contractually obliged to work on this area for some fraction of their time.

这个模型非常低效.我认识一位天体物理学博士生,他花了一年多的时间试图在现有的 n 体代码中实现一个相对基本的数学部分,只有几百行代码.为什么花了这么长时间?因为她确实花了数周时间试图理解现有的、编写得非常糟糕的代码,以及如何将她的计算添加到其中,并且由于整体代码结构以及她自身的缺乏,数月更无效地调试了她的问题的经验.请注意,这个过程几乎不涉及科学;只是浪费时间处理代码.谁最终付出了这个代价?只有她.她是那个不得不花更多时间尝试获得足够的结果来攻读博士学位的人.在她离开后,她的主管会再招一名研究生——如此循环下去.

This model is hugely inefficient. I know a PhD student in astrophysics who spent over a year trying to implement a relatively basic piece of mathematics, only a few hundred lines of code, in an existing n-body code. Why did it take so long? Because she literally spent weeks trying to understand the existing, horribly written code, and how to add her calculation to it, and months more ineffectively debugging her problems due to the monolithic code structure, coupled with her own lack of experience. Note that there was almost no science involved in this process; just wasting time grappling with code. Who ultimately paid that price? Only her. She was the one who had to put more hours in to try and get enough results to make a PhD. Her supervisor will get another grad student after she's gone - and so the cycle continues.

我想说明的一点是,学术界软件创建过程的问题是系统本身的地方性问题,这是可用资源和获得奖励的工作类型的函数.这种文化深深植根于整个学术界.我认为通过外部资源或培训来改变这种文化没有任何简单的方法.需要改变的是系统本身,奖励编写大量代码的人,对使用科学代码产生的结果的正确性进行更严格的审查,认识到代码中培训和过程的重要性,并让主管共同负责浪费他们研究小组成员的时间.

The point I'm trying to make is that the problem with the software creation process in academia is endemic within the system itself, a function of the resources available and the type of work that is rewarded. The culture is deeply embedded throughout academia. I don't see any easy way of changing that culture through external resources or training. It's the system itself that needs to change, to reward people for writing substantial code, to place increased scrutiny on the correctness of results produced using scientific code, to recognise the importance of training and process in code, and to hold supervisors jointly responsible for wasting the time of the members of their research group.

这篇关于研究环境中的软件项目和开发的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆