通过Selenium在Facebook上解码类名称 [英] Decoding Class names on facebook through Selenium
问题描述
我注意到,facebook有一些奇怪的类名,看起来是计算机生成的.我不知道这些类是不是随着时间的推移至少是恒定的,或者它们在某个时间间隔内发生变化?也许有经验的人可以回答.我只能看到的是,当我退出Chrome并再次打开它时,它还是一样,因此至少它们不会更改每个浏览器会话.
I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session.
因此,我猜想抓取Facebook的最好方法是在用户界面中使用一些元素并假定结构始终相同,例如,从关于"部分获取地址是这样的:
So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and assume structure is always the same, like for example to get address from About section something like this:
from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get("https://www.facebook.com/pg/Burma-Superstar-620442791345784/about/?ref=page_internal")
# wait some time
address_elements = driver.find_elements_by_xpath("//span[text()='FIND US']/../following-sibling::div//button[text()='Get Directions']/../../preceding-sibling::div[1]/div/span")
for item in address_elements:
print item.text
推荐答案
您说得很对. Facebook 是通过关键字和标签时非常明显. com/js/js_htmldom.asp"rel =" nofollow noreferrer> HTML DOM :
You were pretty correct. Facebook is built through ReactJS which is pretty much evident from the presence of the following keywords and tags within the HTML DOM:
-
{"react_render":true,"reflow":true}
-
<!-- react-mount-point-unstable -->
-
["React-prod"]
-
["ReactDOM-prod"]
-
ReactComposerTaggerType:{r:["t5r69"],be:1}
{"react_render":true,"reflow":true}
<!-- react-mount-point-unstable -->
["React-prod"]
["ReactDOM-prod"]
ReactComposerTaggerType:{r:["t5r69"],be:1}
因此,动态生成的类名在一定的时间间隔之后必定会发生变化.
So, the dynamically generated class names are bound to change after certain timegaps.
The solution would be to use the static attributes to construct a dynamic Locator Strategy.
要检索文本查找我们下方的地址的第一行,您需要诱导 expected_conditions 作为
To retrieve the first line of the address just below the text FIND US you need to induce WebDriverWait in conjunction with expected_conditions as visibility_of_element_located()
and you can use the following optimized solution:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[normalize-space()='FIND US']//following::span[2]"))))
参考文献
您可以在以下位置找到一些相关的讨论
References
You can find some relevant discussions in:
- Logging Facebook using selenium
- Why Selenium driver fail to recognize ID element of Facebook login page?
注意:报废 Facebook 违反了他们的条款服务条款3.2.3 中所述,您可能会受到质疑,甚至可能进入 Facebook监狱.代替使用
Facebook Graph API
.
Note: Scrapping Facebook violates their Terms of Service of section 3.2.3 and you are liable to be questioned and may even land up in Facebook Jail. Use
Facebook Graph API
instead.
这篇关于通过Selenium在Facebook上解码类名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!