LXML无法检索错误为“无法加载HTTP资源"的网页. [英] LXML unable to retrieve webpage with error "failed to load HTTP resource"

查看：74 发布时间：2020/5/4 8:39:58 python lxml lxml.html

本文介绍了LXML无法检索错误为“无法加载HTTP资源"的网页.的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我尝试在浏览器中打开下面的链接，它可以工作，但在代码中不起作用.该链接实际上是新闻站点的组合，然后是从另一个文件url.txt调用的文章扩展名的组合.我在一个普通的网站(www.google.com)上尝试了该代码，并且效果很好.

Hi so I tried opening the link below in a browser and it works but not in the code. The link is actually a combination of a news site and then the extension of the article called from another file url.txt. I tried the code with a normal website (www.google.com) and it works perfectly.

import sys
import MySQLdb
from mechanize import Browser
from bs4 import BeautifulSoup, SoupStrainer
from nltk import word_tokenize
from nltk.tokenize import *
import urllib2
import nltk, re, pprint
import mechanize #html form filling
import lxml.html

with open("url.txt","r") as f:
    first_line = f.readline()
#print first_line
url = "http://channelnewsasia.com/&s" + (first_line)
t = lxml.html.parse(url)
print t.find(".//title").text

这是我得到的错误.

这是url.txt的内容

And this is the content of url.txt

/news/asiapacific/australia-to-send-armed/1284790.html

推荐答案

这是由于url的&s部分-绝对不需要:

This is because of the &s part of the url - it is definitely not needed:

url = "http://channelnewsasia.com" + first_line

此外，最好使用 urljoin() :

Also, url parts are better be joined using urljoin():

from urlparse import urljoin
import lxml.html

BASE_URL = "http://channelnewsasia.com" 

with open("url.txt") as f:
    first_line = f.readline()

url = urljoin(BASE_URL, first_line)
t = lxml.html.parse(url)
print t.find(".//title").text

打印:

Australia to send armed personnel to MH17 site - Channel NewsAsia

这篇关于LXML无法检索错误为“无法加载HTTP资源"的网页.的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

LXML无法检索错误为“无法加载HTTP资源"的网页. [英] LXML unable to retrieve webpage with error "failed to load HTTP resource"

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

LXML无法检索错误为“无法加载HTTP资源"的网页. [英] LXML unable to retrieve webpage with error &quot;failed to load HTTP resource&quot;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

LXML无法检索错误为“无法加载HTTP资源"的网页. [英] LXML unable to retrieve webpage with error "failed to load HTTP resource"

登录关闭