旅行顾问刮'moreLink' [英] Trip Advisor Scraping 'moreLink'

查看:150
本文介绍了旅行顾问刮'moreLink'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在BS4中构建一个web刮板,并且陷入困境。我使用Trip Advisor作为测试其他数据的测试,但我无法隔离整个评论的标签。以下是一个例子:

https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html



注意第一次审查,在酒单是...下方有一个图标。我能够轻松地隔离部分评论,但一直未能找到让BS4在模拟的更多点击之后拉取评论的方法。我试图弄清楚需要哪些工具?我需要使用硒吗?



原始元素如下所示:

 < span class = partnerRvw > 
>
更多& nbsp; < /跨度>
< span class =ui_icon caret-down>< / span>
< / span>

在点击更多链接后查看HTML,您会发现一个新的动态添加的类,一个与我需要的信息(见下文):

 < div class =review dyn_full_review inlineReviewUpdate provider0 first newFlagstyle = display:block;> 
< a name =UR475091998class =>< / a>
< div id =UR475091998class =extended provider0 first newFlag>
< div class =col1of2>
< div class =member_info>
< div id =UID_6875524F623CC948F4F9CA95BB4A9567-SRC_475091998class =memberOverlayLinkonmouseover =requireCallIfReady('members / memberOverlay','initMemberOverlay',event,this,this.id,'Reviews','user_name_photo') ;数据anchorwidth = 90 >
< div class =avatar profile_6875524F623CC948F4F9CA95BB4A9567>
< a onclick =>

< img src =https://media-cdn.tripadvisor.com/media/photo-l/0d/97/43/bf/joannecarpenter.jpgclass =avatar potentialFacebookAvatar avatarGUID :6875524F623CC948F4F9CA95BB4A9567width =74height =74>
< / a>
< / div>
< div class =username mo>
< span class =expand_inline scrname mbrName_6875524F623CC948F4F9CA95BB4A9567onclick =ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_name_click')> joannecarpenter< / span>
< / div>
< / div>
< div class =location>
德克萨斯州Humble
< / div>
< / div>
< div class =memberBadging g10n>
< div id =UID_6875524F623CC948F4F9CA95BB4A9567-CONTclass =no_cpuonclick =ta.util.cookie.setPIDCookie('15984'); requireCallIfReady('members / memberOverlay','initMemberOverlay',event,this ,this.id,'评论','review_count');数据anchorwidth = 90 >
< div class =levelBadge badge lvl_02>
等级< span>< img src =https://static.tacdn.com/img2/badges/20px/lvl_02.pngalt =class =iconwidth =20height = 20 / >< /跨度>贡献者< / div>
< div class =reviewerBadge badge>
< img src =https://static.tacdn.com/img2/badges/20px/rev_03.pngalt =class =iconwidth =20height =20> ;
< span class =badgeText> 6条评论< / span> < / DIV>
< div class =contributionReviewBadge badge>
< img src =https://static.tacdn.com/img2/badges/20px/Foodie.pngalt =class =iconwidth =20height =20> ;
< span class =badgeText> 6餐厅评论< / span>
< / div>
< / div>
< / div>
< / div>
< div class =col2of2>
< div class =innerBubble>
< div class =quote>< a href =/ ShowUserReviews-g56010-d470148-r475091998-Chez_Nous-Humble_Texas.html#CHECK_RATES_CONTonclick =ta.setEvtCookie('Reviews','title ','',0,this.href); setPID(); id =r475091998>< span class =noQuotes>晚餐< / span>< / a>< / div>
< div class =rating reviewItemInline>
< span class =rate sprite-rating_s rating_s> < img class =sprite-rating_s_fill rating_s_fill s50width =70src =https://static.tacdn.com/img2/x.gifalt =5分,共5分>
< / span>
< span class =new redesigned>全新< / span> < /跨度>
< a class =viaMobilehref =/ appstarget =_ blankonclick =ta.util.cookie.setPIDCookie(24687)>
< span class =ui_icon mobile-phone>< / span>
通过手机
< / a>
< / div>
< div class =entry>
< p>
我们在休斯顿最喜欢的餐厅。绝对是最好和最友好的服务!食物不仅具有天赋,而且绝对美味。我最喜欢的是羔羊。这是最好的!此外,鸭麋,鹅肝酱,脆皮沙拉和法式洋葱汤都很壮观!这是一个必须尝试的餐厅!酒单非常棒。只要问丹尼尔的建议。他不仅了解他的葡萄酒,他喜欢他做的事!我们喜欢这个地方!
< / p>
< / div>
< div class =rating-list>
< div class =recommend>
< span class =recommend-titleInline noRatings> 2017年4月访问< / span>
< / div>
< / div>
< div class =expanded lessLink>
< span class =taLnk collapse ulBlueLinks no_cpu>
减& nbsp;
< / span>
< span class =textArrow_more ui_icon caret-up>< / span>
< / div>
< div id =helpfulq475091998_expandedclass =helpful redesigned white_btn_container>
< span class =isHelpful>有帮助吗?< / span> < DIV类= tgt_helpfulq475091998 rnd_white_thank_btn 的onclick = ta.call( 'ta.servlet.Reviews.helpfulVoteHandlerOb',事件,这一点, 'LeJIVqd4EVIpECri1GII2t6mbqgqguuuxizSxiniaqgeVtIJpEJCIQQoqnQQeVsSVuqHyo3KUKqHMdkKUdvqHxfqHfGVzCQQoqnQQZiptqH5paHcVQQoqnQQrVxEJtxiGIac6XoXmqoTpcdkoKAUAAv0tEn1dkoKAUAAv0zH1o3KUK0pSM13vkooXdqn3XmffAdvqndqnAfbAo77dbAo3k0npEEeJIV1K0EJIVqiJcpV1U0Ii9VC1rZlU3XozxbZZxE2crHN2TDUJiqnkiuzsVEOxdkXqi7TxXpUgyR2xXvOfROwaqILkrzz9MvzCxMva7xEkq8xXNq8ymxbAq8AzzrhhzCxbx2vdNvEn2fnwEfq8alzCeqi53ZrgnMrHhshTtowGpNSmq89IwiVb7crUJxdevaCnJEqI33qiE5JGErJExXKx5ooItGCy5wnCTx2VA7RvxEsO3'); ta.trackEventOnPage( 'HELPFUL_VOTE_TEST', 'helpfulvotegiven_v2'); > ;
< img src =https://static.tacdn.com/img2/icons/icon_thumb_white.pngclass =helpful_thumbs_up white>
< img src =https://static.tacdn.com/img2/icons/icon_thumb_green.pngclass =helpful_thumbs_up green>
< span class =helpful_text>感谢joannecarpenter< / span> < / DIV>
< / div>
< div class =tooltips vertical_centered>
< div class =reportProblem>
< span id =ReportIAP_475091998class =problem collapse taLnkonclick =ta.trackEventOnPage('Report_IAP','Report_Button_Clicked','member'); ta.call('ta.servlet.Reviews。 this.setAttribute('data-first')){ta.trackEventOnPage('Reviews','report_problem','hover_over_flag'); this.setAttribute ('data-first',1)} uiOverlay(event,this)data-tooltip =data-position =abovedata-content =此评论有问题?>
< img src =https://static.tacdn.com/img2/icons/gray_flag.pngwidth =13height =14alt =>
< span class =reportTxt>报告< / span> < /跨度>
< / div>
< / div>
< div class =userLinks>
< div class =sameGeoActivity>
< a href =/ members-citypage / joannecarpenter / g56010target =_ blankonclick =ta.setEvtCookie('Reviews','more_reviews_by_user','',0,this.href); ta .util.cookie.setPIDCookie(19160)>
查看joannecarpenter for Humble的全部5条点评< / a>
< / div>
< div class =askQuestion>
< / div>
< / div>
< div class =note>
此点评仅代表旅行者个人的主观意见,并不代表到到网以及其合作方的意见。 < / DIV>
< div class =duplicateReviewsInline>
< div class =previous> joannecarpenter对Chez Nous< / div>的评论< ul class =dupReviews>
< li class =dupReviewItem>
< div class =reviewTitle>
< a href =/ ShowUserReviews-g56010-d470148-r453237869-Chez_Nous-Humble_Texas.html#REVIEWS>Joanne Carpenter< / a>
< / div>
< div class =rating>
< span class =rate sprite-rating_ss rating_ss> < img class =sprite-rating_ss_fill rating_ss_fill ss50width =50src =https://static.tacdn.com/img2/x.gifalt =5分,共5分>
< / span>
< span class =date> 2017年1月18日评价< / span>
< / div>
< / li>
< / ul>
< / div>
< / div>
< / div>
< / div>
< div class =large>

< / div>
< div class =ad iab_inlineBanner>
< div id =gpt-ad-468x60class =adInner gptAd>< / div>
< / div>
< / div>

BS4有办法处理这个吗?

解决方案

下面是一个简单的例子,让你开始:

<$ p $导入selenium
导入webdriver
驱动程序= webdriver.PhantomJS()
url =https://www.tripadvisor.com/Restaurant_Review-g56010- d470148-Reviews-Chez_Nous-Humble_Texas.html
driver.get(url)

elem = driver.get_element_by_class_name(taLnk)
...

你可以在这里找到更多关于这些方法的信息:
http://selenium-python.readthedocs.io/


I've been building a web scraper in BS4 and have gotten stuck. I am using Trip Advisor as a test for other data I will be going after, but am not able to isolate the tag of the 'entire' reviews. Here is an example:

https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html

Notice in the first review, there is an icon below "the wine list is...". I am able to easily isolate the partial reviews, but have not been able to figure out a way to get BS4 to pull the reviews after a simulated 'More' click. I'm trying to figure out what tool(s) are needed for this? Do I need to use selenium instead?

The original element looks like this:

<span class="partnerRvw">
<span class="taLnk hvrIE6 tr475091998 moreLink ulBlueLinks" onclick="  ta.util.cookie.setPIDCookie(4444); ta.call('ta.servlet.Reviews.expandReviews', {type: 'dummy'}, ta.id('review_475091998'), 'review_475091998', '1', 4444);
  ">
More&nbsp; </span>
<span class="ui_icon caret-down"></span>
</span>

Looking at the HTML after you click on the More link you would find a new dynamically added class that has a with the information I need (see below):

<div class="review dyn_full_review inlineReviewUpdate provider0 first newFlag" style="display: block;">
<a name="UR475091998" class=""></a>
<div id="UR475091998" class="extended provider0 first newFlag">
<div class="col1of2">
<div class="member_info">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-SRC_475091998" class="memberOverlayLink" onmouseover="requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'user_name_photo');" data-anchorwidth="90">
<div class="avatar profile_6875524F623CC948F4F9CA95BB4A9567 ">
<a onclick="">

<img src="https://media-cdn.tripadvisor.com/media/photo-l/0d/97/43/bf/joannecarpenter.jpg" class="avatar potentialFacebookAvatar avatarGUID:6875524F623CC948F4F9CA95BB4A9567" width="74" height="74">
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname mbrName_6875524F623CC948F4F9CA95BB4A9567" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">joannecarpenter</span>
</div>
</div>
<div class="location">
Humble, Texas
</div>
</div>
<div class="memberBadging g10n">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-CONT" class="no_cpu" onclick="ta.util.cookie.setPIDCookie('15984'); requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'review_count');" data-anchorwidth="90">
<div class="levelBadge badge lvl_02">
Level <span><img src="https://static.tacdn.com/img2/badges/20px/lvl_02.png" alt="" class="icon" width="20" height="20/"></span> Contributor </div>
<div class="reviewerBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/rev_03.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 reviews</span> </div>
<div class="contributionReviewBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/Foodie.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 restaurant reviews</span>
</div>
</div>
</div>
</div>
<div class="col2of2">
<div class="innerBubble">
<div class="quote"><a href="/ShowUserReviews-g56010-d470148-r475091998-Chez_Nous-Humble_Texas.html#CHECK_RATES_CONT" onclick="ta.setEvtCookie('Reviews','title','',0,this.href); setPID();" id="r475091998">"<span class="noQuotes">Dinner</span>"</a></div>
<div class="rating reviewItemInline">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s50" width="70" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="ratingDate relativeDate" title="April 12, 2017">Reviewed 3 days ago
<span class="new redesigned">NEW</span> </span>
<a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24687)">
<span class="ui_icon mobile-phone"></span>
via mobile
</a>
</div>
<div class="entry">
<p>
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
</p>
</div>
<div class="rating-list">
<div class="recommend">
<span class="recommend-titleInline noRatings">Visited April 2017</span>
</div>
</div>
<div class="expanded lessLink">
<span class="taLnk collapse ulBlueLinks no_cpu ">
Less&nbsp;
</span>
<span class="textArrow_more ui_icon caret-up"></span>
</div>
<div id="helpfulq475091998_expanded" class="helpful redesigned white_btn_container ">
<span class="isHelpful">Helpful?</span> <div class="tgt_helpfulq475091998 rnd_white_thank_btn" onclick="ta.call('ta.servlet.Reviews.helpfulVoteHandlerOb', event, this, 'LeJIVqd4EVIpECri1GII2t6mbqgqguuuxizSxiniaqgeVtIJpEJCIQQoqnQQeVsSVuqHyo3KUKqHMdkKUdvqHxfqHfGVzCQQoqnQQZiptqH5paHcVQQoqnQQrVxEJtxiGIac6XoXmqoTpcdkoKAUAAv0tEn1dkoKAUAAv0zH1o3KUK0pSM13vkooXdqn3XmffAdvqndqnAfbAo77dbAo3k0npEEeJIV1K0EJIVqiJcpV1U0Ii9VC1rZlU3XozxbZZxE2crHN2TDUJiqnkiuzsVEOxdkXqi7TxXpUgyR2xXvOfROwaqILkrzz9MvzCxMva7xEkq8xXNq8ymxbAq8AzzrhhzCxbx2vdNvEn2fnwEfq8alzCeqi53ZrgnMrHhshTtowGpNSmq89IwiVb7crUJxdevaCnJEqI33qiE5JGErJExXKx5ooItGCy5wnCTx2VA7RvxEsO3'); ta.trackEventOnPage('HELPFUL_VOTE_TEST', 'helpfulvotegiven_v2');">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_white.png" class="helpful_thumbs_up white">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_green.png" class="helpful_thumbs_up green">
<span class="helpful_text">Thank joannecarpenter</span> </div>
</div>
<div class="tooltips vertically_centered">
<div class="reportProblem">
<span id="ReportIAP_475091998" class="problem collapsed taLnk" onclick="ta.trackEventOnPage('Report_IAP', 'Report_Button_Clicked', 'member'); ta.call('ta.servlet.Reviews.iapFlyout', event, this, '475091998')" onmouseover="if (!this.getAttribute('data-first')) {ta.trackEventOnPage('Reviews', 'report_problem', 'hover_over_flag'); this.setAttribute('data-first', 1)} uiOverlay(event, this)" data-tooltip="" data-position="above" data-content="Problem with this review?">
<img src="https://static.tacdn.com/img2/icons/gray_flag.png" width="13" height="14" alt="">
<span class="reportTxt">Report</span> </span>
</div>
</div>
<div class="userLinks">
<div class="sameGeoActivity">
<a href="/members-citypage/joannecarpenter/g56010" target="_blank" onclick="ta.setEvtCookie('Reviews','more_reviews_by_user','',0,this.href); ta.util.cookie.setPIDCookie(19160)">
See all 5 reviews by joannecarpenter for Humble </a>
</div>
<div class="askQuestion">
<span class="taLnk ulBlueLinks" onclick="ta.trackEventOnPage('answers_review','ask_user_intercept_click' ); ta.load('ta-answers', (function() {require('answers/misc').askReviewerIntercept(this, '470148', 'joannecarpenter', '6875524F623CC948F4F9CA95BB4A9567', 'en', '475091998','Chez Nous', 39151)}).bind(this), true);">Ask joannecarpenter about Chez Nous</span>
</div>
</div>
<div class="note">
This review is the subjective opinion of a TripAdvisor member and not of TripAdvisor LLC. </div>
<div class="duplicateReviewsInline">
<div class="previous">joannecarpenter has 1 more review of Chez Nous</div> <ul class="dupReviews">
<li class="dupReviewItem">
<div class="reviewTitle">
<a href="/ShowUserReviews-g56010-d470148-r453237869-Chez_Nous-Humble_Texas.html#REVIEWS">"Joanne Carpenter"</a>
</div>
<div class="rating">
<span class="rate sprite-rating_ss rating_ss"> <img class="sprite-rating_ss_fill rating_ss_fill ss50" width="50" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="date">Reviewed January 18, 2017</span>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="large">

</div>
<div class="ad iab_inlineBanner">
<div id="gpt-ad-468x60" class="adInner gptAd"></div>
</div>
</div>

Is there a way for BS4 to handle this for me?

解决方案

Here's a simple example to get you started:

import selenium
from selenium import webdriver
driver = webdriver.PhantomJS()
url = "https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html"
driver.get(url)

elem = driver.get_element_by_class_name("taLnk")
...

You could find more info about the methods here: http://selenium-python.readthedocs.io/

这篇关于旅行顾问刮'moreLink'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆