在MySQL中存储基因表达数据-需要连接表吗? [英] Storing gene expression data in MySQL -- junction tables needed?

查看:130
本文介绍了在MySQL中存储基因表达数据-需要连接表吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要在MySQL中存储几个m x n个基因表达数据矩阵.

I have several m x n matrices of gene expression data that I want to store in MySQL.

m是大约30,000个基因(可唯一识别)
n是大约3,000个样本(通常是唯一可识别的)

m is approx 30,000 genes (uniquely identifiable)
n is approx 3,000 samples (mostly uniquely identifiable)

我不确定最好的存储方式是什么.我最初将矩阵直接读取到MySQL表中,但是从那以后,我被告知这不是一种很好的处理方法,因为列(样本)的数量是可变的.我无法转置矩阵并以这种方式存储它们,因为创建列时,MySQL所允许的基因比MySQL多.

I'm not sure what the best way is to store these data. I initially read the matrices directly into MySQL tables, but I have since been told that this is not a great way to do things, since the number of columns (samples) is a variable quantity. I cannot transpose the matrices and store them that way because there are more genes than MySQL allows for when creating columns.

从那以后,我被告知连接表"可能是实现此目的的更好方法.但是,在观看了有关这些内容的多个YouTube视频之后,我再也不明智了.我也搜索过Google,但似乎没有关于使用联结表在MySQL中存储基因表达数据的教程.那么,有人对如何最好地存储这些数据有任何建议吗?老实说,我希望有关这方面的文献很多,因此,如果您有有用的链接,也将不胜感激.

I've since been told that 'junction tables' might represent a better way to do this. After watching several YouTube videos on these, however, I'm none the wiser. I've also searched Google and there doesn't seem to be a tutorial on storing gene expression data in MySQL using junction tables. So, does anyone have any advice on how best to store these data? I honestly expected that there would be a huge literature on this, so if you have useful links that would also be much appreciated.

推荐答案

为此,您只需要几个表,我正在使用mysql语法:

You need just a few tables for this, I am using mysql syntax:

CREATE TABLE genes (
`gene_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`gene_name` varchar(99) not null
)ENGINE=InnoDB;

CREATE TABLE samples (
`sample_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`sample_name` varchar(99) not null
)ENGINE=InnoDB;

CREATE TABLE gene_sample (
`gene_id` INT NOT NULL,
`sample_id` INT NOT NULL,
FOREIGN KEY(`gene_id`) REFERENCES genes (`gene_id`),
FOREIGN KEY(`sample_id`) REFERENCES sample (`sample_id`),
)ENGINE=InnoDB;

对于样本中出现的每个基因,将gene_idsample_id对插入gene_sample表中.

For every gene that occurs in a sample, insert the pair of gene_id and sample_id into the gene_sample table.

在SELECT中使用两个JOIN表达式来重建完整数据:

Use two JOIN expressions in a SELECT to reconstruct the full data:

SELECT genes.*, samples.* 
FROM gene_sample 
LEFT JOIN genes USING (gene_id) 
LEFT JOIN samples USING (sample_id);

这篇关于在MySQL中存储基因表达数据-需要连接表吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆