如何使用请求从 Github 下载和写入文件 [英] How to download and write a file from Github using Requests

查看:36
本文介绍了如何使用请求从 Github 下载和写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个文件存在于 github 存储库中:

https://github.com/someguy/brilliant/blob/master/somefile.txt

我正在尝试使用请求来请求此文件,将其内容写入当前工作目录中的磁盘,以便以后使用.现在,我正在使用以下代码:

导入请求从操作系统导入 getcwdurl = "https://github.com/someguy/brilliant/blob/master/somefile.txt"目录 = getcwd()文件名 = 目录 + 'somefile.txt'r = requests.get(url)f = 打开(文件名,'w')f.写(r.内容)

无疑是丑陋的,更重要的是,它不工作.我得到的不是预期的文本:

<!--你好未来的 GitHubber!我敢打赌你是来删除那些讨厌的内联样式的,干掉这些模板,让它们变得漂亮和可重复使用,对吗?请不要.https://github.com/styleguide/templates/2.0--><头><meta http-equiv="Content-type" content="text/html; charset=utf-8"><title>找不到页面·GitHub</title><style type="text/css" media="screen">身体 {背景:#f1f1f1;font-family: "HelveticaNeue", Helvetica, Arial, sans-serif;文本渲染:优化易读性;边距:0;}.container { margin: 50px auto 40px auto;宽度:600px;文本对齐:居中;}{ 颜色:#4183c4;文字装饰:无;}a:访问过{颜色:#4183c4}a:hover { 文字装饰:无;}h1 { 字母间距:-1px;行高:60px;字体大小:60px;字体粗细:100;边距:0px;文本阴影:0 1px 0 #fff;}p { 颜色:rgba(0, 0, 0, 0.5);边距:20px 0 40px;}ul { 列表样式:无;边距:25px 0;填充:0;}li { 显示:表格单元格;字体粗细:粗体;宽度:1%;}#error-suggestions { 字体大小:14px;}#next-steps { 边距:25px 0 50px 0;}#next-steps li { 显示:块;宽度:100%;文本对齐:居中;填充:5px 0;字体粗细:正常;颜色:RGBA(0, 0, 0, 0.5);}#next-steps a { font-weight: bold;}.divider { 边框顶部:1px 实心 #d5d5d5;边框底部:1px 实心 #fafafa;}#parallax_wrapper {位置:相对;z-索引:0;}#parallax_field {溢出:隐藏;位置:绝对;左:0;顶部:0;高度:370px;宽度:100%;}

等等等等

来自 Github 的内容,但不是文件的内容.我做错了什么?

解决方案

相关文件的内容包含在返回的数据中.您将获得该文件的完整 GitHub 视图,而不仅仅是内容.

如果您想下载只是文件,您需要使用页面顶部的Raw链接,该链接(例如):

https://raw.github.com/someguy/brilliant/master/somefile.txt

注意域名的变化,路径的blob/部分不见了.

requests GitHub 存储库本身来演示这一点:

<预><代码>>>>进口请求>>>r = requests.get('https://github.com/kennethreitz/requests/blob/master/README.rst')>>>'请求:'在 r.text 中真的>>>r.headers['内容类型']'文本/html;字符集=utf-8'>>>r = requests.get('https://raw.github.com/kennethreitz/requests/master/README.rst')>>>'请求:'在 r.text 中真的>>>r.headers['内容类型']'文本/普通;字符集=utf-8'>>>打印 r.text请求:HTTP for Humans==========================.. 图像:: https://travis-ci.org/kennethreitz/requests.png?branch=master[... 等等. ...]

Lets say there's a file that lives at the github repo:

https://github.com/someguy/brilliant/blob/master/somefile.txt

I'm trying to use requests to request this file, write the content of it to disk in the current working directory where it can be used later. Right now, I'm using the following code:

import requests
from os import getcwd

url = "https://github.com/someguy/brilliant/blob/master/somefile.txt"
directory = getcwd()
filename = directory + 'somefile.txt'
r = requests.get(url)

f = open(filename,'w')
f.write(r.content)

Undoubtedly ugly, and more importantly, not working. Instead of the expected text, I get:

<!DOCTYPE html>
<!--

Hello future GitHubber! I bet you're here to remove those nasty inline styles,
DRY up these templates and make 'em nice and re-usable, right?

Please, don't. https://github.com/styleguide/templates/2.0

-->
<html>
  <head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <title>Page not found &middot; GitHub</title>
    <style type="text/css" media="screen">
      body {
        background: #f1f1f1;
        font-family: "HelveticaNeue", Helvetica, Arial, sans-serif;
        text-rendering: optimizeLegibility;
        margin: 0; }

      .container { margin: 50px auto 40px auto; width: 600px; text-align: center; }

      a { color: #4183c4; text-decoration: none; }
      a:visited { color: #4183c4 }
      a:hover { text-decoration: none; }

      h1 { letter-spacing: -1px; line-height: 60px; font-size: 60px; font-weight: 100; margin: 0px; text-shadow: 0 1px 0 #fff; }
      p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

      ul { list-style: none; margin: 25px 0; padding: 0; }
      li { display: table-cell; font-weight: bold; width: 1%; }
      #error-suggestions { font-size: 14px; }
      #next-steps { margin: 25px 0 50px 0;}
      #next-steps li { display: block; width: 100%; text-align: center; padding: 5px 0; font-weight: normal; color: rgba(0, 0, 0, 0.5); }
      #next-steps a { font-weight: bold; }
      .divider { border-top: 1px solid #d5d5d5; border-bottom: 1px solid #fafafa;}

      #parallax_wrapper {
        position: relative;
        z-index: 0;
      }
      #parallax_field {
        overflow: hidden;
        position: absolute;
        left: 0;
        top: 0;
        height: 370px;
        width: 100%;
      }

etc etc.

Content from Github, but not the content of the file. What am I doing wrong?

解决方案

The content of the file in question is included in the returned data. You are getting the full GitHub view of that file, not just the contents.

If you want to download just the file, you need to use the Raw link at the top of the page, which will be (for your example):

https://raw.github.com/someguy/brilliant/master/somefile.txt

Note the change in domain name, and the blob/ part of the path is gone.

To demonstrate this with the requests GitHub repository itself:

>>> import requests
>>> r = requests.get('https://github.com/kennethreitz/requests/blob/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/html; charset=utf-8'
>>> r = requests.get('https://raw.github.com/kennethreitz/requests/master/README.rst')
>>> 'Requests:' in r.text
True
>>> r.headers['Content-Type']
'text/plain; charset=utf-8'
>>> print r.text
Requests: HTTP for Humans
=========================


.. image:: https://travis-ci.org/kennethreitz/requests.png?branch=master
[... etc. ...]

这篇关于如何使用请求从 Github 下载和写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆