使用Mathematica在已定义位置的左侧或右侧使用"StringCut" [英] 'StringCut' to the left or right of a defined position using Mathematica

查看：127 发布时间：2020/9/21 3:15:48 string wolfram-mathematica bioinformatics

本文介绍了使用Mathematica在已定义位置的左侧或右侧使用"StringCut"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在阅读这个问题时，我认为使用StringSplit

On reading this question, I thought the following problem would be simple using StringSplit

给出以下字符串，我想将其剪切"到每个"D"的左侧，使得:

Given the following string, I want to 'cut' it to the left of every "D" such that:

我得到一个片段的列表(序列保持不变)

StringJoin @fragments返回原始字符串(但是我是否必须重新排序片段以获得该字符串并不重要).也就是说，每个片段中的顺序很重要，我不想丢失任何字符.

StringJoin@fragments gives back the original string (but is does not matter if I have to reorder the fragments to obtain this). That is, sequence within each fragment is important, and I do not want to lose any characters.

(我感兴趣的示例是一个蛋白质序列(字符串)，其中每个字符都代表一个字母代码的氨基酸.我想获得所有片段的理论列表，这些片段是通过用已知的先裂解的酶处理而获得的"D")

(The example I am interested in is a protein sequence (string) where each character represents an amino acid in one-letter code. I want to obtain the theoretical list of ALL fragments obtained by treating with an enzyme known to split before "D")

str = "MTPDKPSQYDKIEAELQDICNDVLELLDSKGDYFRYLSEVASGDN"

我能想到的最好的方法是使用StringReplace在每个"D"之前插入一个空格，然后使用StringSplit.至少可以这样说，这似乎很尴尬.

The best I can come up with is to insert a space before each "D" using StringReplace and then use StringSplit. This seems quite awkward, to say the least.

frags1 = StringSplit@StringReplace[str, "D" -> " D"]

提供输出:

{"MTP", "DKPSQY", "DKIEAELQ", "DICN", "DVLELL", "DSKG", "DYFRYLSEVASG", "DN"}

，或者使用StringReplacePart:

frags1alt = 
 StringSplit@StringReplacePart[str, " D", StringPosition[str, "D"]]

最后(更现实的是)，如果我想在"D"之前进行拆分，条件是紧接其之前的残基不是"P"(即，PD，(Pro-Asp)键不被裂解)，我可以这样做如下:

Finally (and more realistically), if I want to split before "D" provided that the residue immediately preceding it is not "P" [ie P-D,(Pro-Asp) bonds are not cleaved], I do it as follows:

StringSplit@StringReplace[str, (x_ /; x != "P") ~~ "D" -> x ~~ " D"]

有没有更优雅的方式?

Is there a more elegant way?

速度不一定是问题.我不太可能处理大于500个字符的字符串.我正在使用Mma 7.

Speed is not necessarily an issue. I am unlikely to be dealing with strings of greater than, say, 500 characters. I am using Mma 7.

更新

我已经添加了生物信息学标签，并且我认为从该领域添加示例可能很有趣.

I have added the bioinformatics tag, and I thought it might be of interest to add an example from that field.

以下内容从 NCBI中导入蛋白质序列(牛血清白蛋白，登录号3336842). eutils 数据库，然后生成(理论上的)<一个href ="http://en.wikipedia.org/wiki/Trypsin" rel ="nofollow noreferrer">胰蛋白酶摘要.我假设如果A1不是"R"，"K"或"P"，则当A1为"R"或"K"时，酶的tripsin会在残基A1-A2之间裂解.如果有人有任何改进建议，请随时提出修改建议.

The following imports a protein sequence (Bovine serum albumin, accession number 3336842) from the NCBI database using eutils and then generates a (theoretical) trypsin digest. I have assumed that the enzyme tripsin cleaves between residues A1-A2 when A1 is either "R" or "K", provided that A2 is not "R", "K" or "P". If anyone has any suggestions for improvements, please feel free to suggest modifications.

使用sakra方法的修改(可能需要删除'?db ='之后的回车符):

Using a modification of sakra's method ( a carriage return after '?db=' possibly needs to be removed):

StringJoin /@ 
   Split[Characters[#], 
    And @@ Function[x, #1 != x] /@ {"R", "K"} || 
      Or @@ Function[xx, #2 == xx] /@ {"R", "K", "P"} &] & @
 StringJoin@
  Rest@Import[
    "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=\
protein&id=3336842&rettype=fasta&retmode=text", "Data"]

我可能使用regex方法(Sasha/WReach)来做同样的事情很困难:

My possibly ham-fisted attempt at using the regex method (Sasha/WReach) to do the same thing:

StringSplit[#, RegularExpression["(?![PKR])(?<=[KR])"]] &@
 StringJoin@Rest@Import[...]

输出

{MK,WVTFISLLLLFSSAYSR,GVFRR,<<69>>,CCAADDK,EACFAVEGPK,LVVSTQTALA}

使用Mathematica在已定义位置的左侧或右侧使用"StringCut" [英] 'StringCut' to the left or right of a defined position using Mathematica

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Mathematica在已定义位置的左侧或右侧使用"StringCut" [英] &#39;StringCut&#39; to the left or right of a defined position using Mathematica

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

使用Mathematica在已定义位置的左侧或右侧使用"StringCut" [英] 'StringCut' to the left or right of a defined position using Mathematica

登录关闭