通过SQL计算序列中的字符 [英] Counting chars in sequences via SQL
问题描述
我有一个带有序列表的数据库。该表中的每个(氨基酸)序列由20个不同的字符(A,V,...)组成。例如MQSHAMQCASQALDLYD ...。
I have a database with a sequence table. Each (amino acid) sequence in this table comprises of 20 different chars (A, V, ...). For instance "MQSHAMQCASQALDLYD...".
我想计算每个字符的出现次数,以便得到类似2xM,3xQ,.. 。。
I would like to count the number of appearance of each char, so that I get something like "2xM, 3xQ, ...".
此外,我想在我的DB中的所有序列,这样我得到每个字符的整体外观。 (248xM,71x W,...)。
Furthermore, I would like to do this over all sequences in my DB, so I get the overall appearance of each char. ("248xM, 71x W,...").
如何在PostgreSQL中执行此操作?目前,我使用Ruby,但我有25000个序列,每个长度约400个字符。这需要一段时间,我希望它会更快与SQL。
How can I do this in PostgreSQL? At the moment, I am doing it with Ruby, but I have 25,000 sequences with a length of about 400 chars each. This takes a while and I hope it will be faster with SQL.
推荐答案
这是如何查找字符串中的所有A:
This is How to find all A's in a string:
select length(regexp_replace('AAADDD', '[^A]', '', 'g'));
这是如何查找表中所有A的方法:
This is how to find all A's in a table:
select sum(length(regexp_replace(field, '[^A]', '', 'g'))) from table;
这篇关于通过SQL计算序列中的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!