声明为DNA中的新的数据类型 [英] Declaring a new data type for DNA

查看:82
本文介绍了声明为DNA中的新的数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我涉及生物学,特别DNA和常存在与来自测序基因组中的数据的大小的问题。

I am involved with biology, specifically DNA and often there is a problem with the size of the data that comes from sequencing a genome.

对于那些不具有生物学背景的谁,我会给DNA测序的快速概览。 DNA是由四个字母A,T,G,和C,它们的具体顺序的确定在细胞会发生什么。

For those of you who don't have a background in biology, I'll give a quick overview of DNA sequencing. DNA consists of four letters: A, T, G, and C, the specific order of which determines what happens in the cell.

与DNA测序技术的一个主要问题然而是,产生的数据的大小,(为一个全基因组,往往比千兆字节得多)。

A major problem with DNA sequencing technology however is the size of the data that results, (for a whole genome, often much more than gigabytes).

我知道,在C int的大小从计算机到计算机各不相同,但它仍然有超过四种选择方式更多信息存储可能性。有没有一种方法来定义一个类/方法来定义一个'基地'只占用2个或3个位?我已经搜查了定义结构,但恐怕这不是我要找的。谢谢。

I know that the size of an int in C varies from computer to computer, but it still has way more information storage possibility than four choices. Is there a way to define a type/way to define a 'base' that only takes up 2 or 3 bits? I've searched for defining a structure, but am afraid this isn't what I'm looking for. Thanks.

此外,将这项工作更好地在其他语言(可能更高水平如Java)?

Also, would this work better in other languages (maybe higher level like java)?

推荐答案

你不能只是把这些东西2 ATGC套入,然后一个字节?这样的:

Can't you just stuff two ATGC sets into one byte then? Like:

0 1 0 1 1 0 0 1
A T G C A T G C

所以这一个字节将重新present TC,AC?

So this one byte would represent TC,AC?

这篇关于声明为DNA中的新的数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆