从gitlab网址中提取html内容 [英] extracting html content from gitlab url

查看:75
本文介绍了从gitlab网址中提取html内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从gitlab网址获取html内容.
但是我对Gitlab登录页面感到震惊,即使提供了用户名和密码,我也获得了登录页面的html内容.

I'm trying to get the html content from gitlab url.
But I was struck at Gitlab sign-in page and I am getting html content of sign-in page even after providing username and password.

代码:

    from bs4 import BeautifulSoup 
    import requests
    username = "username"
    password = "password"
    url = "HTTP://gitlab.com/saikumar/webhooktslint"
    result=requests.get(url, auth=("username", "password")).content  /* 
    gets 
    content from the site */
    soup = BeautifulSoup(result,'lxml')
    for link in soup:
       print link

输出:

   Getting HTML content of sign_in page.

预期输出:

   Need to get the HTML content of the URL specified.

推荐答案

我在您的webhooktslint > gitlab.com/saikumar 页,因此它很可能是私有存储库.

I don't see a repo webhooktslint in your gitlab.com/saikumar page, so it is likely to be a private repository.

查看 Python GitLab CLI使用情况,请确保正确使用使用 GitLab专用令牌设置您的~/.python-gitlab.cfg用户配置文件.其中:您不必再处理凭据.

Looking at python GitLab CLI usage, make sure to properly set your ~/.python-gitlab.cfg user configuration file, with a GitLab private token in it: you won't have to deal with credentials then.

gitlab python命令将为您进行卷曲,包括

The gitlab python command will do the curl for you, including to get the raw data of a file.

但是,当您尝试在代码中进行GET私有回购时,相同的私有令牌可以帮助您进行身份验证(如果您追求的是实际的HTML页面内容).

But that same private token can help authenticate you when trying to do a GET of a private repo as you do in your code (if you are after the actual HTML page content).

要点是要访问私人存储库,请使用PAT(个人访问令牌)而不是您的实际帐户密码.

Main point, to access a private repo, use a PAT (Personal Access Token) rather than your actual account password.

这篇关于从gitlab网址中提取html内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆