尝试在 Perl 中使用 HTML::TableExtract 从 HTML 文件中提取表格，但失败 [英] Trying to use HTML::TableExtract in Perl to extract table from HTML file, but failing

查看：53 发布时间：2021/6/15 20:57:05 perl html-tableextract

本文介绍了尝试在 Perl 中使用 HTML::TableExtract 从 HTML 文件中提取表格，但失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从如下网站的表格中提取每个 G 蛋白偶联受体的信息:

I am trying to extract information for each G protein-coupled receptor from tables from a site such as the following:

http://www.iuphar-db.org/DATABASE/ObjectDisplayForward?objectId=1&familyId=1

更具体地说，我想从列(配体、Sp.、Action、Affinity、Units)中提取信息.目前，我一直在从我的提取中输出空文件，因此该模块似乎无法识别我指定的表.这是我迄今为止编写的代码，旨在遍历对应于每个 G 蛋白偶联受体信息的每个 HTML 文件.

More specifically, I want to pull information from the columns (Ligand, Sp., Action, Affinity, Units). Currently, I have been outputting empty files from my extraction, so it would seem that the module is not recognizing the table I am specifying. Here is the code I have written thus far that was designed to go through each HTML file that corresponds to each G protein coupled receptor's information.

use warnings;
use strict;
use HTML::TableExtract;

my @names = `ls /home/wallakin/LINDA/ligands/iuphar/data/html`;

foreach (@names)
{
#Delete empty lines in HTML
open (IN, "</home/wallakin/LINDA/ligands/iuphar/data/html/$_") or die "Can't open html";
my @htmllines = <IN>;
close IN;
for (@htmllines)
{
    s/^\s*$// or s/^\s*//;
}
open (OUT, ">/home/wallakin/LINDA/ligands/iuphar/data/html2/$_");
print OUT @htmllines;
close OUT;

#Extract data from HTML tables based on column headers
my $te = HTML::TableExtract->new ( 
                    headers => [ qw(Ligand Sp. Action Affinity Units) ],
                    depth => 1,
                    count => 1


                    );


$te->parse_file("/home/wallakin/LINDA/ligands/iuphar/data/html2/$_");

my $output = $_;
$output =~ s/\.html/\.txt/g;
open (RESET, ">/home/wallakin/LINDA/ligands/iuphar/data/ligands/$output");
close RESET;
open (DATA, ">>/home/wallakin/LINDA/ligands/iuphar/data/ligands/$output");
binmode (DATA, ":utf8");
binmode (STDOUT, ":utf8");  


foreach my $ts ($te->tables)
{
    print "Table (", join(',', $ts->coords), "):\n";


    foreach my $row ($te->rows)
    {

        foreach ( grep {defined} @$row)
        {
            $_ =~ s/\n/\ /g;
            $_ =~ s/\r//g;  
            #$_ =~ s/\s+/ /g;
        }

        #Each column's data separated by tabs
        print DATA join ("\t", grep {defined} @$row),"\n";
    }
}
close DATA;
}

我写了一个以前的程序(幸好有效)，它为每个 G 蛋白偶联受体获取所有我各自的 HTML 文件，并将其传递到这个程序中.我不确定我是否使用了正确的标题、深度或计数.

I wrote a previous program (that worked, thankfully) that gets all my respective HTML files for each G protein-coupled receptor and have been passing it into this program. I'm not sure if I used the right headers, depth, or count.

如果这篇文章在任何方面听起来很愚蠢，我深表歉意，但总的来说，我是生物信息学和编程的新手.感谢您的帮助！

I apologize if this post sounds stupid in any way, but I am new to bioinformatics and programming, in general. Thanks for any help!

尝试在 Perl 中使用 HTML::TableExtract 从 HTML 文件中提取表格，但失败 [英] Trying to use HTML::TableExtract in Perl to extract table from HTML file, but failing

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

尝试在 Perl 中使用 HTML::TableExtract 从 HTML 文件中提取表格，但失败 [英] Trying to use HTML::TableExtract in Perl to extract table from HTML file, but failing

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭