计算蛋白质和配体之间的距离 [英] Calculating distances between protein and ligand
问题描述
我正在尝试计算蛋白质原子 (ATOM
) 和配体原子 (HETATM
) 的每个坐标之间的距离.我有很多看起来像这样的文件:
I am trying to calculate distance between each coordinate of protein atom (ATOM
) and ligand atom (HETATM
). I have number of files that look like this:
ATOM 1592 HD13 LEU D 46 11.698 -10.914 2.183 1.00 0.00 H
ATOM 1593 HD21 LEU D 46 11.528 -8.800 5.301 1.00 0.00 H
ATOM 1594 HD22 LEU D 46 12.997 -9.452 4.535 1.00 0.00 H
ATOM 1595 HD23 LEU D 46 11.722 -8.718 3.534 1.00 0.00 H
HETATM 1597 N1 308 A 1 0.339 6.314 -9.091 1.00 0.00 N
HETATM 1598 C10 308 A 1 -0.195 5.226 -8.241 1.00 0.00 C
HETATM 1599 C7 308 A 1 -0.991 4.254 -9.133 1.00 0.00 C
HETATM 1600 C1 308 A 1 -1.468 3.053 -8.292 1.00 0.00 C
所以我试图计算 ATOM1
和所有其他 HETATM1
之间的距离,ATOM1
和所有其他 'HETATM2' 之间的距离等等.我已经用 perl 编写了一个脚本,但我无法弄清楚脚本有什么问题,它没有给我任何错误,只是不打印任何内容.
So I am trying to calculate distances between ATOM1
and all other HETATM1
, between ATOM1
and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.
我也不知道如何在脚本中添加它,如果可能的话,如果每个计算的结果大于 5
然后删除这两个包含在计算中的行.如果它是 <=
那么 5
然后保留它.
I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5
then delete this both lines that were included into calculation. If it is <=
then 5
then keep it.
#!/usr/local/bin/perl
use strict;
use warnings;
open(IN, $ARGV[0]) or die "$!";
my (@refer, @points);
my $part = 0;
my $dist;
while (my $line = <IN>) {
chomp($line);
if ($line =~ /^HETATM/) {
$part++;
next;
}
my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
# print "@array\n";
if ($part == 0) {
push @refer, [ @array ];
} elsif ($part ==1){
push @points, [ @array ];
}
}
foreach my $ref(@refer) {
my ($x1, $y1, $z1) = @{$ref};
foreach my $atom(@points) {
my ($x, $y, $z) = @{$atom};
my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
print $dist;
}
}
推荐答案
好的,我不得不说 - 我会重写你的代码,以稍微不同的方式工作.
OK, I have to say - I'd rewrite your code, to work a bit differently.
像这样:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %coordinates;
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do.
my %types;
#read STDIN or files specified on command line - like how grep/sed do it.
while ( <> ) {
my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space.
$coordinates{$type}{$id} = [$x, $y, $z];
$types{$type}++ if $type ne 'ATOM';
}
#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;
#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) {
#iterate all the types (HETATM)
foreach my $type ( sort keys %types ) {
#iterate each id within the data structure.
foreach my $id ( sort keys %{$coordinates{$type}} ) {
my $dist = 0;
#take square of x - x1, y - y1, z - z1
#do it iteratively, using 'for' loop.
$dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2;
$dist = sqrt $dist;
print "$atom_id -> $type $id $dist\n";
}
这是:
- 使用
<>
在命令行上读取 STDIN 或命名文件,而不是手动打开ARGV[0]
来实现类似的结果.(但意味着你也可以通过它管道传输东西). - 首先将您的数据读入哈希.
- 然后迭代所有可能的配对,计算您的距离.
- 有条件地打印是否符合条件(您的所有结果似乎都符合?)
- Using
<>
to read STDIN or named files on command line instead of manually openingARGV[0]
which accomplishes a similar result. (but means you can pipe stuff through it too). - Reads your data into a hash first.
- Then iterates all the possible pairings, calculating your distance.
- Conditionally prints if they match the criteria (all your results seem to?)
结果如下:
1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545
现在您提到想要删除太远"的行 - 这有点复杂,因为您有一个复合条件(并且您将删除所有行).
Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).
问题是 - 在您测试文件中的每个配对之前,您不知道您的 ATOM
行是否有太多距离".
The problem is - you don't know if your ATOM
lines have too much "distance" until you've tested every single pairing in the file.
您或许可以通过以下方式做到这一点:
You could perhaps do this by:
#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) {
#iterate all the types (HETATM)
foreach my $type ( sort keys %types ) {
#iterate each id within the data structure.
foreach my $id ( sort keys %{$coordinates{$type}} ) {
my $dist = 0;
#take square of x - x1, y - y1, z - z1
#do it iteratively, using 'for' loop.
$dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2;
$dist = sqrt $dist;
print "### $atom_id -> $type $id $dist\n";
##note - this will print out multiple times if there's multiple pairings.
if ( $dist <= 5 ) {
print $lines{'ATOM'}{$atom_id};
print $lines{$type}{$id};
}
}
}
}
这将 - 对于每个配对比较打印 ATOM
和 HETATM
行,它们的距离为 <= 5.但如果存在多个配对.
Which will - for each pairing-comparison print both the ATOM
and HETATM
lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.
但我认为您的核心错误在于对 $part
和 next
子句的处理不当.
But I think your core error is in mishandling the $part
and next
clauses.
- 你只会增加
$part
并且当你在0
初始化它时,你永远不会将它重置为零.因此,对于每个连续的HETATM
,它将是 1,2,3,4. - 您在增加
part
后使用next
这意味着您完全跳过if ( $part == 1
子句.
- You only ever increment
$part
and whilst you initialise it at0
, you never reset it to zero. So it'll be 1,2,3,4 for each successiveHETATM
. - You use
next
after incrementingpart
which means you skip theif ( $part == 1
clause entirely.
这篇关于计算蛋白质和配体之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!