从网页上的相关网址重建绝对网址 [英] Reconstructing absolute urls from relative urls on a page

查看：95 发布时间：2018/6/19 21:17:37 python html url-parsing

本文介绍了从网页上的相关网址重建绝对网址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定网页的绝对网址以及在该网页中找到的相关链接，是否有办法 a）明确重建或 b）尽力而为重建相对链接的绝对URL？

Given an absolute url of a page, and a relative link found within that page, would there be a way to a) definitively reconstruct or b) best-effort reconstruct the absolute url of the relative link?

在我的例子中，我使用美丽的汤从给定的url中读取html文件，去除所有img标签源，并尝试构建页面图像的绝对URL的列表。

In my case, I'm reading an html file from a given url using beautiful soup, stripping out all the img tag sources, and trying to construct a list of absolute urls to the page images.

到目前为止，我的Python函数看起来像：

My Python function so far looks like:

function get_image_url(page_url,image_src):

    from urlparse import urlparse
    # parsed = urlparse('http://user:pass@NetLoc:80/path;parameters?query=argument#fragment')
    parsed = urlparse(page_url)
    url_base = parsed.netloc
    url_path = parsed.path

    if src.find('http') == 0:
        # It's an absolute URL, do nothing.
        pass
    elif src.find('/') == 0:
        # If it's a root URL, append it to the base URL:
        src = 'http://' + url_base + src
    else:
        # If it's a relative URL, ?

注意：不需要Python答案，只需要逻辑。

NOTE: Don't need a Python answer, just the logic required.

从网页上的相关网址重建绝对网址 [英] Reconstructing absolute urls from relative urls on a page

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

从网页上的相关网址重建绝对网址 [英] Reconstructing absolute urls from relative urls on a page

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭