如何在 Perl 中重写此代码的一行代码(或命令行中的较少行代码)? [英] How to Rewrite of One Line Code (or Less Line Code in command line) of this code in Perl?
问题描述
我有一个这样的代码:
#!/usr/bin/perl
use strict;
use warnings;
my %proteins = qw/
UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
/;
open(INPUT,"<dna.txt");
while (<INPUT>) {
tr/[a,c,g,t]/[A,C,G,T]/;
y/GCTA/CGAU/;
foreach my $protein (/(...)/g) {
if (defined $proteins{$protein}) {
print $proteins{$protein};
}
}
}
close(INPUT);
此代码与我的其他问题的答案有关:DNA 到 RNA和使用 Perl 获取蛋白质
This code is related to my other question's answer: DNA to RNA and Getting Proteins with Perl
程序的输出为:
SIMQNISGREAT
如何使用 Perl 重写该代码,它会在命令行上运行,并且会用更少的代码(如果可能的话,一行代码)重写?
How can I rewrite that code with Perl, it will run on command line and it will be rewritten with less code(if possible one line code)?
PS 1: dna.txt 是这样的:
PS 1: dna.txt is like that:
TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
PS 2:如果代码行更少,则可以将 my %proteins
变量写入文件.
PS 2: If the code will be less line, it is accepted to write the my %proteins
variable into a file.
推荐答案
有人 (@kamaci) 在另一个线程中呼唤了我的名字.这是我在命令行上保留蛋白质表时能想到的最好的方法:
Somebody (@kamaci) called my name in another thread. This is the best I can come up with while keeping the protein table on the command line:
perl -nE'say+map+substr("FYVDINLHL%VEMKLQL%VEIKLQFYVDINLHCSGASTRPWSGARTRP%SGARTRPCSGASTR",(s/GGG/GGC/i,vec($_,0,32)&101058048)%63,1),/.../g' dna.txt
(Shell 引用,对于 Windows 引用交换 '
和 "
字符.此版本用 %
标记无效密码子,您可能可以修复通过在适当的位置添加 =~y/%//d
来实现.
(Shell quoting, for Windows quoting swap '
and "
characters). This version marks invalid codons with %
, you can probably fix that by adding =~y/%//d
at an appropriate spot.
提示:这从 RNA 三元组的原始 ASCII 编码中挑选出 6 位,给出 0 到 101058048 之间的 64 个代码;为了获得字符串索引,我将结果以 63 为模减少,但这会创建一个双重映射,遗憾的是不得不对两种不同的蛋白质进行编码.s/GGG/GGC/i
将其中一个映射到编码正确蛋白质的另一个.
Hint: This picks out 6 bits from the raw ASCII encoding of an RNA triple, giving 64 codes between 0 and 101058048; to get a string index, I reduce the result modulo 63, but this creates one double mapping which regrettably had to code two different proteins. The s/GGG/GGC/i
maps one of them to another that codes the right protein.
还要注意 %
运算符前的括号,both 将 ,
运算符与 substr
and 修正 &
与 %
的优先级.如果你曾经在生产代码中使用它,你就是一个坏人.
Also note the parentheses before the %
operator which both isolate the ,
operator from the argument list of substr
and fix the precedence of &
vs %
. If you ever use that in production code, you're a bad, bad person.
这篇关于如何在 Perl 中重写此代码的一行代码(或命令行中的较少行代码)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!