计算蛋白质和配体之间的距离 [英] Calculating distances between protein and ligand

查看:110
本文介绍了计算蛋白质和配体之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算蛋白质原子 (ATOM) 和配体原子 (HETATM) 的每个坐标之间的距离.我有很多看起来像这样的文件:

I am trying to calculate distance between each coordinate of protein atom (ATOM) and ligand atom (HETATM). I have number of files that look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C 

所以我试图计算 ATOM1 和所有其他 HETATM1 之间的距离,ATOM1 和所有其他 'HETATM2' 之间的距离等等.我已经用 perl 编写了一个脚本,但我无法弄清楚脚本有什么问题,它没有给我任何错误,只是不打印任何内容.

So I am trying to calculate distances between ATOM1 and all other HETATM1, between ATOM1 and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.

我也不知道如何在脚本中添加它,如果可能的话,如果每个计算的结果大于 5 然后删除这两个包含在计算中的行.如果它是 <= 那么 5 然后保留它.

I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5 then delete this both lines that were included into calculation. If it is <= then 5 then keep it.

#!/usr/local/bin/perl 

    use strict;
    use warnings;

    open(IN, $ARGV[0]) or die "$!"; 
    my (@refer, @points);
    my $part = 0;
    my $dist;
    while (my $line = <IN>) { 
        chomp($line);
        if ($line =~ /^HETATM/) {
            $part++;
            next;
        }
        my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
    #    print "@array\n";
        if ($part == 0) {
            push @refer, [ @array ]; 
        } elsif ($part ==1){
            push @points, [ @array ]; 
        }
    }

        foreach my $ref(@refer) {
        my ($x1, $y1, $z1) = @{$ref};
        foreach my $atom(@points) {
            my ($x, $y, $z) = @{$atom};
            my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
        print $dist;

        }

    }

推荐答案

好的,我不得不说 - 我会重写你的代码,以稍微不同的方式工作.

OK, I have to say - I'd rewrite your code, to work a bit differently.

像这样:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my %coordinates; 
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do. 

my %types; 

#read STDIN or files specified on command line - like how grep/sed do it. 
while ( <> ) {
   my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space. 
   $coordinates{$type}{$id} = [$x, $y, $z];
   $types{$type}++ if $type ne 'ATOM'; 
}

#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "$atom_id -> $type $id $dist\n";
      }

这是:

  • 使用 <> 在命令行上读取 STDIN 或命名文件,而不是手动打开 ARGV[0] 来实现类似的结果.(但意味着你也可以通过它管道传输东西).
  • 首先将您的数据读入哈希.
  • 然后迭代所有可能的配对,计算您的距离.
  • 有条件地打印是否符合条件(您的所有结果似乎都符合?)
  • Using <> to read STDIN or named files on command line instead of manually opening ARGV[0] which accomplishes a similar result. (but means you can pipe stuff through it too).
  • Reads your data into a hash first.
  • Then iterates all the possible pairings, calculating your distance.
  • Conditionally prints if they match the criteria (all your results seem to?)

结果如下:

1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545

现在您提到想要删除太远"的行 - 这有点复杂,因为您有一个复合条件(并且您将删除所有行).

Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).

问题是 - 在您测试文件中的每个配对之前,您不知道您的 ATOM 行是否有太多距离".

The problem is - you don't know if your ATOM lines have too much "distance" until you've tested every single pairing in the file.

您或许可以通过以下方式做到这一点:

You could perhaps do this by:

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "### $atom_id -> $type $id $dist\n";

         ##note - this will print out multiple times if there's multiple pairings. 
         if ( $dist <= 5 ) {
            print $lines{'ATOM'}{$atom_id};
            print $lines{$type}{$id};
         }
      }
   }
}

这将 - 对于每个配对比较打印 ATOMHETATM 行,它们的距离为 <= 5.但如果存在多个配对.

Which will - for each pairing-comparison print both the ATOM and HETATM lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.

但我认为您的核心错误在于对 $partnext 子句的处理不当.

But I think your core error is in mishandling the $part and next clauses.

  • 你只会增加 $part 并且当你在 0 初始化它时,你永远不会将它重置为零.因此,对于每个连续的 HETATM,它将是 1,2,3,4.
  • 您在增加 part 后使用 next 这意味着您完全跳过 if ( $part == 1 子句.
  • You only ever increment $part and whilst you initialise it at 0, you never reset it to zero. So it'll be 1,2,3,4 for each successive HETATM.
  • You use next after incrementing part which means you skip the if ( $part == 1 clause entirely.

这篇关于计算蛋白质和配体之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆