我如何从网站中提取特殊类型的表在Perl中? [英] how can i extract special kind of table from website in perl?

查看:103
本文介绍了我如何从网站中提取特殊类型的表在Perl中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从网站获取所有表格 http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype= desc& rcnt = 50& page = 1 ,使用Perl模块HTML :: TableExtract,但我无法获得所需的表;而是我只能得到前两个表,这对我没用。



这是我的代码:

 #!/ usr / bin / perl 
#!perl -w
使用DBI;
使用strict;
使用WWW :: Mechanize;
使用HTML :: TableExtract;
my $ mech = WWW :: Mechanize-> new();
my $ url ='http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page= 1' ;
$ mech - >获取($网址);
chomp(my $ script = $ mech - >内容);
my $ table = new HTML :: TableExtract();
$ table-> parse($ script);

foreach my $ ts($ table-> table){

printTable(,join(',',$ ts-> coords), ):\\\
;

foreach my $ row($ ts-> rows){
print join(',',@ $ row),\\\
;
}
}

输出:

 表(0,0):
,Search FinanceSearch Web
表(0,1):

行情您查看此处快速访问。

像这样,我只得到前两个表格,而不是全部。

解决方案

第三个表格是使用JavaScript动态生成的。 WWW :: Mechanize 不支持JavaScript,您需要使用 WWW :: Mechanize :: Firefox 而不是

请注意,这将需要你安装一个Firefox浏览器,它的mozrepl插件,以及 MozRepl Perl模块

  use strict; 
使用警告'all';
使用特征'say';
使用打开qw /:std:encoding(UTF-8)/;

使用WWW :: Mechanize :: Firefox;
使用HTML :: TableExtract;

使用常量URL => http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page=1’ ;

my $ mech = WWW :: Mechanize :: Firefox-> new;
$ mech-> autoclose_tab(0);
$ mech-> get(URL);

my $ te = HTML :: TableExtract-> new(depth => 0,count => 2);
$ te-> parse($ mech-> content);

($ te->行){
local $=',';
print@ $ row \\\
;
}



输出



瑞银ETRACS ISE独家Hmbldrs ETN,HOMX,消费品周期性,瑞银集团股份公司,+ 13.43%, -  6.74%, -  6.74%, -  19.54%,0.0%,0.0%
VelocityShares 3x长天然气ETN, UGAZ,Trading-Leveraged Commodities,Credit Suisse AG,+ 9.33%, - 60.15%, - 60.15%, - 91.36%, - 81.58%,0.0%
ProShares Ultra Bloomberg天然气,BOIL,交易杠杆型商品, ProShares,+ 5.96%, - 41.13%, - 41.13%, - 76.12%, - 62.1%,0.0%
每日巴西公牛3X ETF,BRZU,交易杠杆股,Direxion基金,+ 4.24%,+ 66.64%,+ 66.64%, - 61.7%,0.0%,0.0%
商品双长期ETN,DYY,交易杠杆商品,德意志银行+ 4.16%, - 25.87%, - 25.87% 41.98%, - 34.91%, - 32.8%
德国X-trackers农业双长期ETN,DAG,交易杠杆化商品,MSCI EMktsHiDvYdHgEq,HDEE,多元化新兴市场,德意志资产管理,+ 3.73%, - 2.15%, - 2.15%,0.0%,0.0%,0.0%美国天然气,UNG,Commodities Energy,美国商品基金有限责任公司,+ 3.15%,+ 3.12%,+ 3.12%, - 19.14%, - 29.52%, - 25.49% ,-23.18%, - 23.18%, - 49.7%, - 32.73%, - 32.09%
每日交易量指标日均交易日熊证3倍ETF,JDST,交易逆向平等,Direxion Funds,+ 3.03%, - 80.83% ,-80.83%, - 88.33%,0.0%,0.0%
Direxion Daily S& P Biotech Bull 3X ETF,LABU,交易杠杆股权,Direxion Funds,+ 2.97%, - 67.51%, - 67.51% 0.0%,0.0%,0.0%
VelocityShares 3x逆向ETN,DSLV,Trading-Inverse Commodities,Credit Suisse AG,+ 2.88%, - 36.11%, - 36.11%, - 15.03%,+ 13.64%,0.0 %
ProShares Ultra MSCI巴西上限,UBR,交易杠杆股权,ProShares,+ 2.85%,+ 51.79%,+ 51.79%, - 37.19%, - 42.04%, - 38.48%
Direxion Daily India Bull 3X ETF,INDL,交易杠杆股权,Direxion Funds,+ 2.71%, - 9.82%, - 9 .82%, - 48.84%, - 13.11%, - 22.69%
Direxion每日房地产公牛3X ETF,DRN,交易杠杆股权,Direxion Funds,+ 2.66%,+ 15.65%,+ 15.65% 0.71%,+ 21.8%,+ 22.37%
iShares US Telecommunications,IYZ,Communications,iShares,+ 2.63%,+ 7.43%,+ 7.43%,+ 3.79%,+ 10.74%,+ 7.88%
ProShares超半导体,美元,交易杠杆股票,ProShares,+ 2.58%, - 4.09%, - 4.09%, - 6.65%,+ 31.6%,+ 15.12%
Direxion每日药物和Medcl Bl 2X ETF, PILL,交易杠杆股票,Direxion Funds,+ 2.57%, - 27.81%, - 27.81%,0.0%,0.0%,0.0%
智商对冲事件驱动追踪ETF,QED,市场中性,IndexIQ,+ 2.54%,+ 1.51%,+ 1.51%, - 1.64%,0.0%,0.0%
Direxion每日区域牛市3倍ETF,DPST,交易杠杆股票,Direxion基金,+ 2.51%, - 20.73% -20.73%,0.0%,0.0%,0.0%
VelocityShares 3x逆向黄金ETN,DGLD,Trading-Inverse Commodities,Credit Suisse AG,+ 2.44%, - 40.19%, - 40.19%, - 24.19%,+ 6.6%,0.0%
Direxion每日韩国公牛3X ETF,KORU,交易杠杆股权,Direxion Funds,+ 2.43%,+ 12.48%,+ 12.48%, - 30.3 7%,0.0%,0.0%
VelocityShares每日反向VIX ST ETN,XIV,波动率,Credit Suisse AG,+ 2.43%,+ 0.31%,+ 0.31%, - 25.29%,+ 3.55%,+ 13.26%
ProShares Ultra S& P区域银行,KRU,交易杠杆股票,ProShares,+ 2.42%, - 22.89%, - 22.89%, - 17.41%,+ 10.48%,+ 9.61%
全球X FTSE Andean 40 ETF,以及拉美股票,环球X基金,+ 2.38%,+ 12.8%,+ 12.8%, - 15.28%, - 19.56%, - 11.0%
AccuShares SpotCBOE®VIX®ETC Down ,VXDN,波动率,AccuShares™,+ 2.32%, - 12.72%, - 12.72%,0.0%,0.0%,0.0%
ProShares Short S& P区域银行业务,KRS,交易逆向平等,ProShares,+ 2.3%,+ 7.46%,+ 7.46%, - 0.5%, - 12.4%, - 14.52%
美国12个月天然气,UNL,Commodities Energy,美国商品基金有限责任公司,+ 2.3%, - 8.84 ProShares,2.28%,+ 0.16%,+ 0.16%, - 25.73% ,+ 3.53%,0.0%
SPDR®S& P运输ETF,XTN,工业,SPDR State Street Global Advisors,+ 2.24%,+ 7.3%,+ 7.3%, - 12.73%,+ 12.63%,+ 12.34%
iShares MSCI阿联酋上涨,阿联酋,其他地区iShares,+ 2.22%,+ 5.31%,+ 5.31%, - 4.98%,0.0%,0.0%
VelocityShares 3x反向原油ETN,DWTI, Trading-Inverse Commodities,Credit Suisse AG,+ 2.21%, - 20.19%, - 20.19%,+ 19.62%,+ 56.72%,0.0%
ProShares超高收益,UJB,交易杠杆债务,ProShares,+ 2.19%,+ 10.09%,+ 10.09%, - 6.81%,+ 1.36%,0.0%
iPath®Bloomberg家畜分类ETN,COW,商品农业,巴克莱基金,+ 2.19%,+ 0.94%,+ 0.94 %, - 10.57%, - 3.1%, - 5.92%
iPath®Bloomberg天然气SubTR ETN,GAZ,商品能源,巴克莱基金,+ 2.19%, - 31.94%, - 31.94%, - 59.17% 44.91%, - 49.91%
Direxion每日小型股公牛3X ETF,TNA,交易杠杆股权,Direxion Funds,+ 2.16%, - 8.7%, - 8.7%, - 35.41%,+ 10.14%,+ 6.17 %
ProShares Ultra Utilities,UPW,交易杠杆股票,ProShares,+ 2.16%,+ 30.83%,+ 30.83%,+ 28.99%,+ 22.85%,+ 24.31%
SPDR®富国银行首选股票ETF,PSK,优先股,SPDR State Street Global Advisors,+ 2.14%,+ 1.77%,+ 1.77%,+ 5.83%,+ 5.83%,+ 6.08%
Pr OShares UltraShort Silver,ZSL,Trading-Inverse Commodities,ProShares,+ 2.14%, - 23.44%, - 23.44%, - 1.99%,+ 21.56%, - 2.97%
ProShares UltraPro Russell2000,URTY,交易杠杆股权,ProShares,+ 2.14%, - 8.53%, - 8.53%, - 34.81%,+ 10.76%,+ 7.06%
Direxion Daily Emrg Mkts Bull 3X ETF,EDC,交易杠杆股权,Direxion Funds + 2.11农业ETN,ADZ,交易逆向商品,德意志银行,+ 2.1%, - 11.59%, - + 13.64%,+ 13.64%, - 44.79%, - 25.84%, - 28.1% 11.59%, - 5.04%,+ 11.94%,+ 7.62%
DB 3x长期25年以上国债ETN,LBND,交易杠杆债务,德意志银行+ 2.08%,+ 22.67%,+ 22.67% ,+ 2.15%,+ 11.31%,+ 24.21%
PureFunds ISE Cyber​​ Security™ETF,HACK,Technology,Pure Funds,+ 2.02%, - 7.45%, - 7.45%, - 14.3%,0.0%,0.0 %
ProShares UltraPro MidCap400,UMDD,交易杠杆股票,ProShares,+ 1.95%,+ 6.04%,+ 6.04%, - 19.63%,+ 20.32%,+ 16.2%
ProShares Ultra Telecommunications,LTL ,交易杠杆股票,ProShares,+ 1.79%,+ 13.74%,+ 13.74%,+ 0.87%,+ 18.18%,+ 11.67%
Direxion Daily Hmbldrs&a mp; Supls Bull 3X ETF,NAIL,交易杠杆股权,Direxion Funds,+ 1.79%, - 10.32%, - 10.32%,0.0%,0.0%,0.0%
Teucrium小麦ETF,WEAT,商品农业, Teucrium,+ 1.78%, - 1.54%, - 1.54%, - 17.67%, - 21.21%,0.0%
Vanguard电信服务ETF,VOX,Communications,Vanguard,+ 1.76%,+ 11.11%,+ 11.11% ,+ 11.82%,+ 11.64%,+ 9.99%
Direxion Daily Financial 3X ETF,FAS,交易杠杆股票,Direxion Funds,+ 1.75%, - 14.79%, - 14.79%, - 18.93%,+ 21.63%,+ 14.48%
美国环球战机ETF,JETS,杂项部门,美国全球投资者,+ 1.75%,+ 1.77%,+ 1.77%,0.0%,0.0%,0.0%


I am trying to fetch all tables from the website http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page=1, using Perl module HTML::TableExtract, but I can't get the desired table; instead I get the first two tables only, which are useless to me.

Here is my code:

#!/usr/bin/perl
#!perl -w
use DBI;
use strict;
use WWW::Mechanize;
use HTML::TableExtract;
my $mech= WWW::Mechanize->new();
my $url= 'http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page=1';
$mech -> get($url);
chomp(my $script = $mech -> content);
my $table=new HTML::TableExtract();
$table->parse($script);

foreach my $ts($table->tables){

print "Table (", join(',', $ts->coords), "):\n";

foreach my $row ($ts->rows){
    print join(',', @$row), "\n";
}
}

output:

Table (0,0):
  ,Search FinanceSearch Web
Table (0,1):

                              Quotes you view appear here for quick access.

like this I only get the first two tables instead of all of them.

解决方案

The third table is generated dynamically using JavaScript. WWW::Mechanize doesn't support JavaScript, and you will need to use WWW::Mechanize::Firefox instead

Note that this will require you to install a Firefox web browser, and its mozrepl plugin, as well as the MozRepl Perl module

use strict;
use warnings 'all';
use feature 'say';
use open qw/ :std :encoding(UTF-8) /;

use WWW::Mechanize::Firefox;
use HTML::TableExtract;

use constant URL => 'http://finance.yahoo.com/etf/lists/?bypass=true&mod_id=mediaquotesetf&tab=tab1&scol=imkt&stype=desc&rcnt=50&page=1';

my $mech = WWW::Mechanize::Firefox->new;
$mech->autoclose_tab(0);
$mech->get(URL);

my $te = HTML::TableExtract->new(depth => 0, count => 2);
$te->parse( $mech->content );

for my $row ( $te->rows ) {
    local $" = ',';
    print "@$row\n";
}

output

ETF Name,Ticker,Category,Fund Family,Intraday Return,3-MO Return,YTD Return,1-YR Return,3-YR Return,5-YR Return
UBS ETRACS ISE Exclusively Hmbldrs ETN,HOMX,Consumer Cyclical,UBS Group AG,+13.43%,-6.74%,-6.74%,-19.54%,0.0%,0.0%
VelocityShares 3x Long Natural Gas ETN,UGAZ,Trading-Leveraged Commodities,Credit Suisse AG,+9.33%,-60.15%,-60.15%,-91.36%,-81.58%,0.0%
ProShares Ultra Bloomberg Natural Gas,BOIL,Trading-Leveraged Commodities,ProShares,+5.96%,-41.13%,-41.13%,-76.12%,-62.1%,0.0%
Direxion Daily Brazil Bull 3X ETF,BRZU,Trading-Leveraged Equity,Direxion Funds,+4.24%,+66.64%,+66.64%,-61.7%,0.0%,0.0%
DB Commodity Double Long ETN,DYY,Trading-Leveraged Commodities,Deutsche Bank AG,+4.16%,-25.87%,-25.87%,-41.98%,-34.91%,-32.8%
Deutsche X-trackers MSCI EMktsHiDvYdHgEq,HDEE,Diversified Emerging Mkts,Deutsche Asset Management,+3.73%,-2.15%,-2.15%,0.0%,0.0%,0.0%
DB Agriculture Double Long ETN,DAG,Trading-Leveraged Commodities,Deutsche Bank AG,+3.57%,+3.12%,+3.12%,-19.14%,-29.52%,-25.49%
United States Natural Gas,UNG,Commodities Energy,United States Commodity Funds LLC,+3.15%,-23.18%,-23.18%,-49.7%,-32.73%,-32.09%
Direxion Daily Jr Gld Mnrs Bear 3X ETF,JDST,Trading-Inverse Equity,Direxion Funds,+3.03%,-80.83%,-80.83%,-88.33%,0.0%,0.0%
Direxion Daily S&P Biotech Bull 3X ETF,LABU,Trading-Leveraged Equity,Direxion Funds,+2.97%,-67.51%,-67.51%,0.0%,0.0%,0.0%
VelocityShares 3x Inverse Silver ETN,DSLV,Trading-Inverse Commodities,Credit Suisse AG,+2.88%,-36.11%,-36.11%,-15.03%,+13.64%,0.0%
ProShares Ultra MSCI Brazil Capped,UBR,Trading-Leveraged Equity,ProShares,+2.85%,+51.79%,+51.79%,-37.19%,-42.04%,-38.48%
Direxion Daily India Bull 3X ETF,INDL,Trading-Leveraged Equity,Direxion Funds,+2.71%,-9.82%,-9.82%,-48.84%,-13.11%,-22.69%
Direxion Daily Real Estate Bull 3X ETF,DRN,Trading-Leveraged Equity,Direxion Funds,+2.66%,+15.65%,+15.65%,-0.71%,+21.8%,+22.37%
iShares US Telecommunications,IYZ,Communications,iShares,+2.63%,+7.43%,+7.43%,+3.79%,+10.74%,+7.88%
ProShares Ultra Semiconductors,USD,Trading-Leveraged Equity,ProShares,+2.58%,-4.09%,-4.09%,-6.65%,+31.6%,+15.12%
Direxion Daily Pharmctcl&Medcl Bl 2X ETF,PILL,Trading-Leveraged Equity,Direxion Funds,+2.57%,-27.81%,-27.81%,0.0%,0.0%,0.0%
IQ Hedge Event-Driven Tracker ETF,QED,Market Neutral,IndexIQ,+2.54%,+1.51%,+1.51%,-1.64%,0.0%,0.0%
Direxion Daily Regional Bnks Bull 3X ETF,DPST,Trading-Leveraged Equity,Direxion Funds,+2.51%,-20.73%,-20.73%,0.0%,0.0%,0.0%
VelocityShares 3x Inverse Gold ETN,DGLD,Trading-Inverse Commodities,Credit Suisse AG,+2.44%,-40.19%,-40.19%,-24.19%,+6.6%,0.0%
Direxion Daily South Korea Bull 3X ETF,KORU,Trading-Leveraged Equity,Direxion Funds,+2.43%,+12.48%,+12.48%,-30.37%,0.0%,0.0%
VelocityShares Daily Inverse VIX ST ETN,XIV,Volatility,Credit Suisse AG,+2.43%,+0.31%,+0.31%,-25.29%,+3.55%,+13.26%
ProShares Ultra S&P Regional Banking,KRU,Trading-Leveraged Equity,ProShares,+2.42%,-22.89%,-22.89%,-17.41%,+10.48%,+9.61%
Global X FTSE Andean 40 ETF,AND,Latin America Stock,Global X Funds,+2.38%,+12.8%,+12.8%,-15.28%,-19.56%,-11.0%
AccuShares Spot CBOE® VIX® ETC Down,VXDN,Volatility,AccuShares™,+2.32%,-12.72%,-12.72%,0.0%,0.0%,0.0%
ProShares Short S&P Regional Banking,KRS,Trading-Inverse Equity,ProShares,+2.3%,+7.46%,+7.46%,-0.5%,-12.4%,-14.52%
United States 12 Month Natural Gas,UNL,Commodities Energy,United States Commodity Funds LLC,+2.3%,-8.84%,-8.84%,-29.88%,-22.73%,-23.91%
ProShares Short VIX Short-Term Futures,SVXY,Volatility,ProShares,+2.28%,+0.16%,+0.16%,-25.73%,+3.53%,0.0%
SPDR® S&P Transportation ETF,XTN,Industrials,SPDR State Street Global Advisors,+2.24%,+7.3%,+7.3%,-12.73%,+12.63%,+12.34%
iShares MSCI UAE Capped,UAE,Miscellaneous Region,iShares,+2.22%,+5.31%,+5.31%,-4.98%,0.0%,0.0%
VelocityShares 3x Inverse Crude Oil ETN,DWTI,Trading-Inverse Commodities,Credit Suisse AG,+2.21%,-20.19%,-20.19%,+19.62%,+56.72%,0.0%
ProShares Ultra High Yield,UJB,Trading-Leveraged Debt,ProShares,+2.19%,+10.09%,+10.09%,-6.81%,+1.36%,0.0%
iPath® Bloomberg Livestock SubTR ETN,COW,Commodities Agriculture,Barclays Funds,+2.19%,+0.94%,+0.94%,-10.57%,-3.1%,-5.92%
iPath® Bloomberg Natural Gas SubTR ETN,GAZ,Commodities Energy,Barclays Funds,+2.19%,-31.94%,-31.94%,-59.17%,-44.91%,-49.91%
Direxion Daily Small Cap Bull 3X ETF,TNA,Trading-Leveraged Equity,Direxion Funds,+2.16%,-8.7%,-8.7%,-35.41%,+10.14%,+6.17%
ProShares Ultra Utilities,UPW,Trading-Leveraged Equity,ProShares,+2.16%,+30.83%,+30.83%,+28.99%,+22.85%,+24.31%
SPDR® Wells Fargo Preferred Stock ETF,PSK,Preferred Stock,SPDR State Street Global Advisors,+2.14%,+1.77%,+1.77%,+5.83%,+5.83%,+6.08%
ProShares UltraShort Silver,ZSL,Trading-Inverse Commodities,ProShares,+2.14%,-23.44%,-23.44%,-1.99%,+21.56%,-2.97%
ProShares UltraPro Russell2000,URTY,Trading-Leveraged Equity,ProShares,+2.14%,-8.53%,-8.53%,-34.81%,+10.76%,+7.06%
Direxion Daily Emrg Mkts Bull 3X ETF,EDC,Trading-Leveraged Equity,Direxion Funds,+2.11%,+13.64%,+13.64%,-44.79%,-25.84%,-28.1%
DB Agriculture Short ETN,ADZ,Trading-Inverse Commodities,Deutsche Bank AG,+2.1%,-11.59%,-11.59%,-5.04%,+11.94%,+7.62%
DB 3x Long 25+ Year Treasury Bond ETN,LBND,Trading-Leveraged Debt,Deutsche Bank AG,+2.08%,+22.67%,+22.67%,+2.15%,+11.31%,+24.21%
PureFunds ISE Cyber Security™ ETF,HACK,Technology,Pure Funds,+2.02%,-7.45%,-7.45%,-14.3%,0.0%,0.0%
ProShares UltraPro MidCap400,UMDD,Trading-Leveraged Equity,ProShares,+1.95%,+6.04%,+6.04%,-19.63%,+20.32%,+16.2%
ProShares Ultra Telecommunications,LTL,Trading-Leveraged Equity,ProShares,+1.79%,+13.74%,+13.74%,+0.87%,+18.18%,+11.67%
Direxion Daily Hmbldrs&Supls Bull 3X ETF,NAIL,Trading-Leveraged Equity,Direxion Funds,+1.79%,-10.32%,-10.32%,0.0%,0.0%,0.0%
Teucrium Wheat ETF,WEAT,Commodities Agriculture,Teucrium,+1.78%,-1.54%,-1.54%,-17.67%,-21.21%,0.0%
Vanguard Telecommunication Services ETF,VOX,Communications,Vanguard,+1.76%,+11.11%,+11.11%,+11.82%,+11.64%,+9.99%
Direxion Daily Financial Bull 3X ETF,FAS,Trading-Leveraged Equity,Direxion Funds,+1.75%,-14.79%,-14.79%,-18.93%,+21.63%,+14.48%
US Global Jets ETF,JETS,Miscellaneous Sector,U.S. Global Investors,+1.75%,+1.77%,+1.77%,0.0%,0.0%,0.0%

这篇关于我如何从网站中提取特殊类型的表在Perl中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆