通过 Selenium 在 facebook 上解码类名称 [英] Decoding Class names on facebook through Selenium

查看:11
本文介绍了通过 Selenium 在 facebook 上解码类名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到 facebook 有一些奇怪的类名,看起来是计算机生成的.我不知道这些类是至少随着时间的推移保持不变还是在某个时间间隔内发生变化?也许有这方面经验的人可以回答.我唯一能看到的是,当我退出 Chrome 并再次打开它时,它仍然是一样的,所以至少他们不会更改每个浏览器会话.

I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session.

所以我猜想抓取 facebook 的最好方法是在用户界面中使用一些元素并假设结构总是相同的,例如从关于"部分获取地址,如下所示:

So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and assume structure is always the same, like for example to get address from About section something like this:

from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver.exe")

driver.get("https://www.facebook.com/pg/Burma-Superstar-620442791345784/about/?ref=page_internal")
# wait some time
address_elements = driver.find_elements_by_xpath("//span[text()='FIND US']/../following-sibling::div//button[text()='Get Directions']/../../preceding-sibling::div[1]/div/span")
for item in address_elements:
    print item.text

推荐答案

你说得非常正确.Facebook 是通过 ReactJSHTML DOM:

You were pretty correct. Facebook is built through ReactJS which is pretty much evident from the presence of the following keywords and tags within the HTML DOM:

  • {"react_render":true,"reflow":true}
  • [React-prod"]
  • [ReactDOM-prod"]
  • ReactComposerTaggerType:{r:["t5r69"],be:1}

因此,动态生成的类名必然会在某些时间间隔之后发生变化.

So, the dynamically generated class names are bound to change after certain timegaps.

解决方案是使用静态属性来构建一个动态定位器策略.

The solution would be to use the static attributes to construct a dynamic Locator Strategy.

要检索文本正下方地址的第一行 FIND US,您需要引入 WebDriverWaitexpected_conditions 作为 visibility_of_element_located(),您可以使用以下优化的解决方案:

To retrieve the first line of the address just below the text FIND US you need to induce WebDriverWait in conjunction with expected_conditions as visibility_of_element_located() and you can use the following optimized solution:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[normalize-space()='FIND US']//following::span[2]"))))


参考文献

您可以在以下位置找到一些相关讨论:


References

You can find some relevant discussions in:

注意:Scraping Facebook 违反了他们的 第 3.2.3 节的服务条款,您可能会受到质疑,甚至可能登陆 Facebook 监狱.使用 Facebook Graph API相反.

Note: Scraping Facebook violates their Terms of Service of section 3.2.3 and you are liable to be questioned and may even land up in Facebook Jail. Use Facebook Graph API instead.

这篇关于通过 Selenium 在 facebook 上解码类名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆