如何提取DeepL页面的textarea框架中的文本? [英] How to extract the text in the textarea frame of the DeepL page?

查看:135
本文介绍了如何提取DeepL页面的textarea框架中的文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F

我们看到了:

但是在代码中,翻译后的文本为"Bonjour,请评论allez-vous aujourd'hui?".不会出现在页面源代码的任何位置,并且框架的代码如下:

But in code, the translated text "Bonjour, comment allez-vous aujourd'hui?" doesn't appear in any place of the page's source and the frame's code looks like:

<textarea class="lmt__textarea lmt__target_textarea lmt__textarea_base_style" 
data-gramm_editor="false" tabindex="110" dl-test="translator-target-input" 
lang="fr-FR" style="height: 300px;"></textarea>

无论我如何通过BeautifulSoup读取文本或来源,都无法提取textarea框架中的译文.

And no matter how I read the text or source through BeautifulSoup, the translation in that textarea frame just can't be extracted.

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')
bsoup = BeautifulSoup(response.content.decode('utf8'))

bsoup.find_all('textarea')

如何从 https://www.deepl中提取页面任何部分的翻译.com/translator ?

How to extract the translations from any part of the page from the https://www.deepl.com/translator?

推荐答案

要从textarea字段中提取文本,请使用.get_attribute('value').

To extract text from textarea field, use .get_attribute('value').

在这里,我添加了硒使用WebDriverWait.visibility_of_element_located方法等待元素的方式.

Here I add the way Selenium waits for an element using WebDriverWait with the .visibility_of_element_located method.

但是有时当元素可用时(对于这种情况),它不能保证文本已经存在,因此请添加循环,直到text != ''

But sometimes when an element is available (for this case), it doesn't guarantee that the text already exists, so add a loop until text != ''

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

#maybe you need browser executable path here
driver = webdriver.Chrome()
driver.get('https://www.deepl.com/translator#en/fr/Hello%2C%20how%20are%20you%20today%3F')

while True:
    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.lmt__side_container--target textarea')))
    if(element.get_attribute('value') != ''):
        time.sleep(1)
        text_target = element.get_attribute('value')
        break

print(text_target)
driver.quit()

希望这会有所帮助.

这篇关于如何提取DeepL页面的textarea框架中的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆