通过Selenium在Facebook上解码类名称 [英] Decoding Class names on facebook through Selenium

查看:74
本文介绍了通过Selenium在Facebook上解码类名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到,facebook有一些奇怪的类名,看起来是计算机生成的.我不知道这些类是不是随着时间的推移至少是恒定的,或者它们在某个时间间隔内发生变化?也许有经验的人可以回答.我只能看到的是,当我退出Chrome并再次打开它时,它还是一样,因此至少它们不会更改每个浏览器会话.

I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session.

因此,我猜想抓取Facebook的最好方法是在用户界面中使用一些元素并假定结构​​始终相同,例如,从关于"部分获取地址是这样的:

So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and assume structure is always the same, like for example to get address from About section something like this:

from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver.exe")

driver.get("https://www.facebook.com/pg/Burma-Superstar-620442791345784/about/?ref=page_internal")
# wait some time
address_elements = driver.find_elements_by_xpath("//span[text()='FIND US']/../following-sibling::div//button[text()='Get Directions']/../../preceding-sibling::div[1]/div/span")
for item in address_elements:
    print item.text

推荐答案

您说得很对. Facebook 是通过关键字和标签时非常明显. com/js/js_htmldom.asp"rel =" nofollow noreferrer> HTML DOM :

You were pretty correct. Facebook is built through ReactJS which is pretty much evident from the presence of the following keywords and tags within the HTML DOM:

  • {"react_render":true,"reflow":true}
  • <!-- react-mount-point-unstable -->
  • ["React-prod"]
  • ["ReactDOM-prod"]
  • ReactComposerTaggerType:{r:["t5r69"],be:1}
  • {"react_render":true,"reflow":true}
  • <!-- react-mount-point-unstable -->
  • ["React-prod"]
  • ["ReactDOM-prod"]
  • ReactComposerTaggerType:{r:["t5r69"],be:1}

因此,动态生成的类名在一定的时间间隔之后必定会发生变化.

So, the dynamically generated class names are bound to change after certain timegaps.

解决方案是使用 static 属性构造 dynamic

The solution would be to use the static attributes to construct a dynamic Locator Strategy.

要检索文本查找我们下方的地址的第一行,您需要诱导 ,您可以使用以下优化的解决方案:

To retrieve the first line of the address just below the text FIND US you need to induce WebDriverWait in conjunction with expected_conditions as visibility_of_element_located() and you can use the following optimized solution:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[normalize-space()='FIND US']//following::span[2]"))))


参考文献

您可以在以下位置找到一些相关的讨论


References

You can find some relevant discussions in:

  • Logging Facebook using selenium
  • Why Selenium driver fail to recognize ID element of Facebook login page?

注意:报废 Facebook 违反了他们的条款服务条款3.2.3 中所述,您可能会受到质疑,甚至可能进入 Facebook监狱.代替使用 Facebook Graph API .

Note: Scrapping Facebook violates their Terms of Service of section 3.2.3 and you are liable to be questioned and may even land up in Facebook Jail. Use Facebook Graph API instead.

这篇关于通过Selenium在Facebook上解码类名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆