计算蛋白质和配体之间的距离 [英] Calculating distances between protein and ligand

查看：110 发布时间：2021/6/15 20:27:54 perl

本文介绍了计算蛋白质和配体之间的距离的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试计算蛋白质原子 (ATOM) 和配体原子 (HETATM) 的每个坐标之间的距离.我有很多看起来像这样的文件:

I am trying to calculate distance between each coordinate of protein atom (ATOM) and ligand atom (HETATM). I have number of files that look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C

所以我试图计算 ATOM1 和所有其他 HETATM1 之间的距离，ATOM1 和所有其他 'HETATM2' 之间的距离等等.我已经用 perl 编写了一个脚本，但我无法弄清楚脚本有什么问题，它没有给我任何错误，只是不打印任何内容.

So I am trying to calculate distances between ATOM1 and all other HETATM1, between ATOM1 and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.

我也不知道如何在脚本中添加它，如果可能的话，如果每个计算的结果大于 5 然后删除这两个包含在计算中的行.如果它是 <= 那么 5 然后保留它.

I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5 then delete this both lines that were included into calculation. If it is <= then 5 then keep it.

#!/usr/local/bin/perl 

    use strict;
    use warnings;

    open(IN, $ARGV[0]) or die "$!"; 
    my (@refer, @points);
    my $part = 0;
    my $dist;
    while (my $line = <IN>) { 
        chomp($line);
        if ($line =~ /^HETATM/) {
            $part++;
            next;
        }
        my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
    #    print "@array\n";
        if ($part == 0) {
            push @refer, [ @array ]; 
        } elsif ($part ==1){
            push @points, [ @array ]; 
        }
    }

        foreach my $ref(@refer) {
        my ($x1, $y1, $z1) = @{$ref};
        foreach my $atom(@points) {
            my ($x, $y, $z) = @{$atom};
            my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
        print $dist;

        }

    }

推荐答案

好的，我不得不说 - 我会重写你的代码，以稍微不同的方式工作.

OK, I have to say - I'd rewrite your code, to work a bit differently.

像这样:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my %coordinates; 
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do. 

my %types; 

#read STDIN or files specified on command line - like how grep/sed do it. 
while ( <> ) {
   my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space. 
   $coordinates{$type}{$id} = [$x, $y, $z];
   $types{$type}++ if $type ne 'ATOM'; 
}

#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "$atom_id -> $type $id $dist\n";
      }

这是:

使用 <> 在命令行上读取 STDIN 或命名文件，而不是手动打开 ARGV[0] 来实现类似的结果.(但意味着你也可以通过它管道传输东西).
首先将您的数据读入哈希.
然后迭代所有可能的配对，计算您的距离.
有条件地打印是否符合条件(您的所有结果似乎都符合?)

Using <> to read STDIN or named files on command line instead of manually opening ARGV[0] which accomplishes a similar result. (but means you can pipe stuff through it too).
Reads your data into a hash first.
Then iterates all the possible pairings, calculating your distance.
Conditionally prints if they match the criteria (all your results seem to?)

结果如下:

1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545

现在您提到想要删除太远"的行 - 这有点复杂，因为您有一个复合条件(并且您将删除所有行).

Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).

问题是 - 在您测试文件中的每个配对之前，您不知道您的 ATOM 行是否有太多距离".

The problem is - you don't know if your ATOM lines have too much "distance" until you've tested every single pairing in the file.

您或许可以通过以下方式做到这一点:

You could perhaps do this by:

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "### $atom_id -> $type $id $dist\n";

         ##note - this will print out multiple times if there's multiple pairings. 
         if ( $dist <= 5 ) {
            print $lines{'ATOM'}{$atom_id};
            print $lines{$type}{$id};
         }
      }
   }
}

这将 - 对于每个配对比较打印 ATOM 和 HETATM 行，它们的距离为 <= 5.但如果存在多个配对.

Which will - for each pairing-comparison print both the ATOM and HETATM lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.

但我认为您的核心错误在于对 $part 和 next 子句的处理不当.

But I think your core error is in mishandling the $part and next clauses.

你只会增加 $part 并且当你在 0 初始化它时，你永远不会将它重置为零.因此，对于每个连续的 HETATM，它将是 1,2,3,4.
您在增加 part 后使用 next 这意味着您完全跳过 if ( $part == 1 子句.

You only ever increment $part and whilst you initialise it at 0, you never reset it to zero. So it'll be 1,2,3,4 for each successive HETATM.
You use next after incrementing part which means you skip the if ( $part == 1 clause entirely.

这篇关于计算蛋白质和配体之间的距离的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算蛋白质和配体之间的距离 [英] Calculating distances between protein and ligand

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算蛋白质和配体之间的距离 [英] Calculating distances between protein and ligand

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭