面临“wget"的问题在蟒蛇 [英] facing issue with "wget" in python

查看：20 发布时间：2021/9/24 20:13:07 python wget

本文介绍了面临“wget"的问题在蟒蛇的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对python非常陌生.我面临着wget"和urllib.urlretrieve(str(myurl),tail)"的问题

当我运行脚本时它正在下载文件但文件名以?"结尾

我的完整代码:

导入操作系统导入 wget导入 urllib导入子流程将 open('/var/log/na/na.access.log') 作为输入文件，将 open('/tmp/reddy_log.txt', 'w') 作为输出文件:结果 = 设置()对于 infile 中的行:如果 '200' 在线:令牌 = line.split()results.add(tokens[6]) # 7th token对于排序(结果)中的结果:打印 >>输出文件，结果打开 ('/tmp/reddy_log.txt') 作为 infile:结果 = 设置()对于 infile 中的行:头，尾 = os.path.split(line)打印尾myurl = "http://data.xyz.com" + str(line)打印我的网址wget.download(str(myurl))# urllib.urlretrieve(str(myurl),tail)

输出:

# python last.py0011400026_recap.xmlhttp://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml最新_1.xmlhttp://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml当前时间.js

列出文件:

# ls0011400026_recap.xml?当前时间.js?最新_1.xml?今天.xml?

解决方案

对您所经历的行为的一种可能解释是不清理您的输入 line

<块引用>

 with open ('/tmp/reddy_log.txt') as infile:...对于 infile 中的行:...myurl = "http://data.xyz.com" + str(line)wget.download(str(myurl))

当你迭代一个文件对象时，(for line in infile:) 字符串你得到的是一个换行符 ('\n') 终止——如果你没有在使用 line 之前去掉换行符，哦，换行符仍然存在于您使用 line 产生的内容中……

为了说明这个概念，请看一下成绩单我做过的测试

08:28 $ cat >一份文件一种乙C08:29 $ 猫 >测试文件数据 = 打开('a_file')对于数据行:new_file = 打开(行，'w')new_file.close()08:31 $ lsa_file test.py08:31 $ python test.py08:31 $ ls一种?a_file b?C?测试文件08:31 $ ls -ba\n a_file b\n c\n test.py08:31 $

如您所见，我从文件中读取行并使用line 作为文件名，猜猜是什么，ls 列出的文件名最后有一个 ? — 但我们可以做得更好，正如在ls

的精美手册页<块引用>

 -b, --escape为非图形字符打印 C 风格的转义符

而且，正如您在 ls -b 的输出中所见，文件名不是以问号结尾(它只是默认使用的占位符由 ls 程序)但以换行符终止.

当我在做的时候，我不得不说你应该避免使用用于存储计算中间结果的临时文件.

Python 的一个很好的特性是存在生成器表达式，如果你愿意，你可以写你的代码如下

import wget# 你在整条线上匹配了一个200"，我假设# 你真正想要的是匹配一个特定的列，'error_column'# 我象征性地从外部资源加载从 my_constants 导入 error_column、payload_column# 这里是一系列生成器表达式，每个表达式都依赖# 在上一个# 1. 文件中的行，从空白处剥离# 在右边(换行符被认为是空格)# === 不是绝对必要的，只是方便，因为# === 下面我们要测试非空行lines = (line.rstrip() for line in open('whatever.csv'))# 2. 行被转换为tokens"列表all_tokens = (line.split() for line in lines if line)# 3. 对于 'all_tokens' 生成器表达式中的每个 'tokens'，我们# 检查代码200"并可能生成一个新目标目标 = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')# 最后，使用 'targets' 生成器继续下载对于目标中的目标:wget.download(target)

不要被评论的数量所迷惑，没有评论我的代码只是

import wget从 my_constants 导入 error_columnlines = (line.rstrip() for line in open('whatever.csv'))all_tokens = (line.split() for line in lines if line)目标 = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')对于目标中的目标:wget.download(target)

I am very novice to python. I am facing issue with "wget" as well as " urllib.urlretrieve(str(myurl),tail)"

when I run script it's downloading files but filename are ending with "?"

my complete code :

import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
    results = set()
    for line in infile:
        if ' 200 ' in line:
            tokens = line.split()
            results.add(tokens[6]) # 7th token
    for result in sorted(results):
        print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
     results = set()
     for line in infile:
     head, tail = os.path.split(line)
                print tail
                myurl = "http://data.xyz.com" + str(line)
                print myurl
                wget.download(str(myurl))
                #  urllib.urlretrieve(str(myurl),tail)

output :

# python last.py
0011400026_recap.xml

http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml

latest_1.xml

http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml

currenttime.js

Listing the files :

# ls
0011400026_recap.xml?                   currenttime.js?  latest_1.xml?      today.xml?

解决方案

A possible explanation of the behaviour you experience is that you do not sanitize your input line

with open ('/tmp/reddy_log.txt') as infile:
     ...
     for line in infile:
         ...
         myurl = "http://data.xyz.com" + str(line)
         wget.download(str(myurl))

When you iterate on a file object, (for line in infile:) the string you get is terminated by a newline ('\n') character — if you do not remove the newline before using line, oh well, the newline character is still there in what is produced by your use of line …

As an illustration of this concept, have a look at the transcript of a test I've done

08:28 $ cat > a_file
a
b
c
08:29 $ cat > test.py
data = open('a_file')
for line in data:
    new_file = open(line, 'w')
    new_file.close() 
08:31 $ ls
a_file  test.py
08:31 $ python test.py
08:31 $ ls
a?  a_file  b?  c?  test.py
08:31 $ ls -b
a\n  a_file  b\n  c\n  test.py
08:31 $

As you can see, I read lines from a file and create some files using line as the filename and guess what, the filenames as listed by ls have a ? at the end — but we can do better, as it's explained in the fine manual page of ls

  -b, --escape
         print C-style escapes for nongraphic characters

and, as you can see in the output of ls -b, the filenames are not terminated by a question mark (it's just a placeholder used by default by the ls program) but are terminated by a newline character.

While I'm at it, I have to say that you should avoid to use a temporary file to store the intermediate results of your computation.

A nice feature of Python is the presence of generator expressions, if you want you can write your code as follows

import wget

# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column

# here it is a sequence of generator expressions, each one relying
# on the previous one

# 1. the lines in the file, stripped from the white space
#    on the right (the newline is considered white space)
#    === not strictly necessary, just convenient because
#    === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))

# 2. the lines are converted to a list of 'tokens' 
all_tokens = (line.split() for line in lines if line)

# 3. for each 'tokens' in the 'all_tokens' generator expression, we
#    check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)

Don't be fooled by the amount of comments, w/o comments my code is just

import wget
from my_constants import error_column

lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

for target in targets: wget.download(target)

这篇关于面临“wget"的问题在蟒蛇的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

面临“wget"的问题在蟒蛇 [英] facing issue with "wget" in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

面临“wget"的问题在蟒蛇 [英] facing issue with &quot;wget&quot; in python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

面临“wget"的问题在蟒蛇 [英] facing issue with "wget" in python

登录关闭