评论我的第一个脚本? [英] Comments on my first script?
问题描述
我热衷于学习python,并且非常注重做事。
" pythonic"因此,所以在几个小时内将以下脚本放在一起
作为编程python的第一次尝试。
我想要社区''关于我做了什么的想法/评论;
我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的
需求。但欢迎任何评论!
#!/ usr / bin / env python
##打开包含域列表的文件(每行1个) ,
##请求并解析它的whois记录并推送到csv
##文件。
import子流程
导入重新
src = open(''./ domains.txt'')
dest = open(''./ whois.csv'',''w'');
sep =" |"
headers = [ 域名,注册人,注册人
地址",注册商,注册人类型,注册日期,续订等>
Date",Last Updated,Name Servers]
dest.write(sep.join(headers)+" \ n")
def trim(txt):
x = []
for txt.split(" \ n" ):
如果line.strip()=="":
继续
if line.strip()。sta rtswith(''WHOIS'):
继续
如果line.strip()。startswith(''>>>''):
继续
如果line.strip()。startswith(''%''):
继续
如果是行。 startswith( - ):
return''''。join(x)
x.append(" " + line)
返回" \ n" .join(x)
x = []
isok = re.compile(" ^ \s?([^:] +):")。匹配
for line in txt .split(" \ n"):
match = isok(line)
如果不匹配:
继续
x.append(行)
返回" \ n" .join(x);
def clean_co_uk(rec):
rec = rec.replace(''公司编号:'',''公司编号 - '')
rec = rec.replace(" \ n \ n"," \ n")
rec = rec.replace(" \ n","")
rec = rec.replace( ":",":\ n")
rec = re.sub("([^(] [a-zA-Z''] + \??[ a-zA-Z] *:\ n)"," \ n\ g< 0>",rec)
rec = rec.replace(":\ n",":")
rec = re.sub(" ^ [] + \ n","",rec)
返回n rec
def clean_net(rec):
rec = rec.replace(" \ n\ n"," \ n" )
rec = rec.replace(" \ n","")
rec = rec.replace(":",": \ n")
rec = re.sub("([a-zA-Z''] + \??[a-zA-Z] *:\ n)" ;," \ n \ g< 0>",rec)
rec = rec.replace(":\ n",":")
返回rec
def clean_info(rec):
x = []
for rec.split中的行( " \ n"):
x.append(re.sub(" ^([^:] +):"," \g< 0",line))
返回" \ n" .join(x)
def记录(域名,记录):
details = [''','''''''''''''''''''''''''''''''''''''''''''
for k,v in record.items():
try:
details [0] = domain.lower()
result = {
" registrant":lambda:1,
" registrant name":lambda:1,
" registrant type" ;:lambda:4,
" registrant'的地址" ;: lambda:2,
" registrant address1":lambda:2,
注册商:lambda:3,
" sponsoring registrar":lambda:3,
"注册于&:; lambda:5,
" registered":lambda:5,
" domain registeration date":lambda:5,
" renewal date":lambda: 6,
最后更新:lambda:7,
域名最后更新日期:lambda:7,
"名称服务器":lambda:8,
" name server" ;:lambda:8,
" nameservers":lambda:8,
更新日期:lambda:7,
创建日期:lambda:5,
"到期日期" ;:lambda:6,>
domai n到期日&:lambda:6,
" administrative contact":lambda:2
} [k.lower()]()
如果v!='''':
详情[结果] = v
除了:
继续
dest.write(sep.join(details)+" \ n")
##循环通过域名
for src中的域名:
domain = domain.strip()
if domain =='''':
继续
rec = subprocess.Popen([" whois",domain],
stdout = subprocess.PIPE).communicate()[0 ]
如果rec.startswith(没有whois服务器)==真:
继续
如果rec.startswith(此TLD没有whois服务器)==真:
继续
rec = trim(rec)
如果domain.endswith(" .net"):
rec = clean_net(rec)
if domain.endswith (" .com"):
rec = clean_net(rec)
如果domain.endswith(" .tv"):
rec = clean_net(rec)
if domain.endswith(" .co.uk"):
rec = clean_co_uk(rec)
if domain.endswith(" .info" ):
rec = clean_info(rec)
rec = clean(rec)
details = {}
尝试:
for rec.split(" \ n"):
bits = line.split ('':'')
a = bits.pop(0)
b = bits.pop(0)
details [a.strip ()] = b.strip()。replace(" \t",",")
除了:
继续
记录(域名,详情)
##清理
src.close()
dest。 close()
" Phillip B Oldham" < ph ************ @ gmail.com写信息
新闻:7e ****************** **************** @ 26g2000h sk.googlegroups.com ...
我想要社区'关于我做了什么的想法/评论;
我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的
需求。但欢迎任何评论!
我不是专家,但这里有一些想法。我希望他们能帮忙。
#!/ usr / bin / env python
##打开一个包含域列表的文件(1每行),
##请求并解析它的whois记录并推送到csv
##文件。
您可能希望将doc字符串作为一种方法来提供更长的
文档,就像你的程序所做的一样。
dest = open(''。/ whois.csv'',''w'');
分号!!!! :)
def trim(txt):
x = []
for line in txt.split( " \ n"):
如果line.strip()=="":
继续
if line.strip ().startswith(''WHOIS''):
继续
如果line.strip()。startswith(''>>>''):
继续
如果line.strip()。startswith(''%''):
继续
if line.startswith(" - "):
return''''。join(x)
这就是全部适当缩进?你可以做的一件事就是将每一行放在
一行上,因为它们非常简单:
如果是line.strip()。startswith(' 'WHOIS''):继续
虽然我仍然喜欢适当的缩进。但是你有很多这样的东西呢
这样可以节省大量的空间。
另外,只是我的个人喜好,我喜欢与我用于字符串的
引号的类型一致。在这里,你在
不同的行上混合使用单引号和双引号。
return" \ n" .join(x);
分号!!!! :) :)
details = ['''','''','''','''','''','' '',''','''','''''
我现在没有Python可用,但我认为你可以这样做
代替:
details = [''''] * 9
除外:
继续
非特定除非条款通常不是首选,因为它们会抓住
一切,甚至是你可能不想捕捉的东西。
if domain =='''':
继续
您可以说:
如果不是域名
而不是等价测试。但这个if语句是做什么的?
如果rec.startswith(没有whois服务器)== True:
continue <如果rec.startswith(此TLD没有whois服务器),则
; = = True:
继续
如上所述,你不需要== True。这里。
if domain.endswith(" .net"):
rec = clean_net(rec)
如果domain.endswith(&。com):
rec = clean_net(rec)
if domain.endswith(" ; .tv"):
rec = clean_net(rec)
if domain.endswith(" .co.uk"):
如果domain.endswith(&。info):
rec = clean_info $ b
嗯,我的第一个想法就是做这样的事情,所有这些如果测试:
扩展名为[< list所有扩展名为字符串,这里>]:
rec = clean_net(扩展名)
但为了实现这一点,您可能需要概括clean_net函数,以便
它适用于所有人,而不是必须调用不同的功能
,具体取决于扩展名。
无论如何,我希望其中一些有用!
< blockquote>" John Salerno" < jo ****** @ NOSPAMgmail.com写在留言中
news:48 ********************** @ news .astraweb.com ...
> if domain.endswith(" .net"):
rec = clean_net(rec)
如果domain.endswith(" .com"):
rec = clean_net(rec)
如果domain.endswith(") .tv"):
rec = clean_net(rec)
如果domain.endswith(" .co.uk"):
rec = clean_co_uk(rec)
如果domain.endswith(" .info"):
rec = clean_info(rec)
嗯,我的第一个想法是如果
测试那么做这样的事情:
扩展名为[<将所有扩展名列为字符串在这里>]:
rec = clean_net(扩展名)
哎呀,我猜你还需要if测试!
用于扩展[<将所有扩展名列为字符串此处>]:
如果dom ain.endswith(扩展名):
rec = clean_net(扩展名)
不确定这是否理想。
6月12日下午4:27 *,Phillip B Oldham< phillip.old ... @ gmail.comwrote:
我''我热衷于学习python,并且非常注重做事。
" pythonic"因此,所以在几个小时内将以下脚本放在一起
作为编程python的第一次尝试。
我想要社区''关于我做了什么的想法/评论;
我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的
需求。但欢迎任何评论!
#!/ usr / bin / env python
##打开包含域列表的文件(每行1个) ,
##请求并解析它的whois记录并推送到csv
##文件。
import子流程
导入重新
src = open(''./ domains.txt'')
dest = open(''./ whois.csv'',''w'');
sep =" |"
headers = [ 域名,注册人,注册人
地址",注册商,注册人类型,注册日期,续订等>
Date",Last Updated,Name Servers]
dest.write(sep.join(headers)+" \ n")
def trim(txt):
* * * * x = []
* * * * for txt in line .split(" \ n"):
* * * * * * * * if line.strip()=="":
* * * * * * * * * * * *继续
* * * * * * * *如果line.strip()。startswith(''WHOIS''):
* * * * * * * * * * * *继续
* * * * * * * *如果line.strip()。startswith(''>>>''):
* * * * * * * * * * * *继续
* * * * * * * *如果line.strip()。startswith(''%''):<如果line.startswith(& - ,"):
* * * * * * * * * * * *返回''''。join(x)
* * * * * * * * x.append(" ; " + line)
* * * * return" \ n" .join(x)
def clean(txt):
* * * * x = []
* * * * isok = re.compile(" ^ \s?([^:] +):")。匹配
* * * *表示txt.split中的行(&\ n"):
* * * * * * * * match = isok(line)
* * * * * * * *如果不匹配:
* * * * * * * * * * * *继续
* * * * * * * * x.append(行)
* * * *返回" \ n" .join(x);
def clean_co_uk(rec):
* * * * rec = rec.replace(''公司编号:'',''公司编号 - '')
* * * * rec = rec.replace(" \ n\ n"," \ n")
* * * * rec = rec.replace(" \ n", "")
* * * * rec = rec.replace(":",":\ n")
* * * * rec = re.sub("([^(] [a-zA-Z''] + \ s?[a-zA-Z] *:\ n)"," \ n\ g< ; 0>",rec)
* * * * rec = rec.replace(":\ n",":")
* * * * rec = re.sub(" ^ [] + \ n","",rec)
* * * * return rec
def clean_net(rec):
* * * * rec = rec.replace(" \ n\ n"," \ n")
* * * * rec = rec.replace (" \ n","")
* * * * rec = rec.replace(":",":\ n")
* * * * rec = re.sub("([a-zA-Z''] + \??[a-zA-Z] *:\ n)"," \ n \g< 0>",rec)
* * * * rec = rec.replace(":\ n",":")
* * * *返回rec
def clean_info(rec):
* * * * x = []
* * * *表示rec.split中的行(" \ n"):
* * * * * * * * x.append(re.sub(" ^([^: ] +):"," \g< 0",line))
* * * * return" \ n" .join(x)
>
def记录(dom ain,record):
* * * * details = ['''','''','''','''','''','''', '''','''',''''
* * * *代表k,v代表record.items():
* * * * * * * *尝试:
* * * * * * * * * * * *详情[0] = domain.lower()
* * * * * * * * * * * *结果= {
* * * * * * * * * * * * * * * *" registrant":lambda:1,
* * * * * * * * * * * * * * * *"注册人名称::lambda:1,
* * * * * * * * * * * * * * * * 注册人类型:lambda:4,
* * * * * * * * * * * * * * * *"注册人'的地址":lambda:2,>
* * * * * * * * * * * * * * * *" registrant address1":lambda:2,
* * * * * * * * * * * * * * * *注册商:lambda:3,
* * * * * * * * * * * * * * * *赞助注册商:lambda:3,
* * * * * * * * * * * * * * * *注册o n":lambda:5,
* * * * * * * * * * * * * * * *"注册":lambda:5,
* * * * * * * * * * * * * * * *域名注册日期:lambda:5,
* * * * * * * * * * * * * * * *" ;续约日期:lambda:6,
* * * * * * * * * * * * * * * *最后更新:lambda:7,
* * * * * * * * * * * * * * * *域名最后更新日期:lambda:7,
* * * * * * * * * * * * * * * *名称服务器:lambda:8,
* * * * * * * * * * * * * * * *" name server" ;:lambda:8,>
* * * * * * * * * * * * * * * *" nameservers" ;:lambda:8,
* * * * * * * * * * * * * * * *更新日期:lambda:7,
* * * * * * * * * * * * * * * *创建日期:lambda:5,
* * * * * * * * * * * * * * * *到期日期:lambda:6,
* * * * * * * * * * * * * * * *域名到期日期和现状t;:lambda:6,
* * * * * * * * * * * * * * * *" administrative contact":lambda:2
* * * * * * * * * * * *} [k.lower()]()
* * * * * * * * * * * *如果v!='''':
* * * * * * * * * * * * * * * *详情[结果] = v
* * * * * * * *除外:
* * * * * * * * * * * *继续
* * * * dest.write(sep.join(details)+" \ n" ;)
##循环通过域名
for src中的域名:
* * * * domain = domain.strip()
* * * *如果domain =='''':
* * * * * * * * continue
* * * * rec = subprocess.Popen([" whois",domain],
stdout = subprocess.PIPE).communicate()[0] <如果rec.startswith(No whois server)== True:
* * * * * * * />
* * * *如果rec.startswith(此TLD没有whois服务器)== True:
* * * * * * * *继续
* * * * rec = trim(rec)
* * * * if domain.endswith(" .net" ):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(" .com" ):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(" .tv" ):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(&。co。英国):
* * * * * * * * rec = clean_co_uk(rec)
* * * * if domain.endswith("。 info"):
* * * * * * * * rec = clean_info(rec)
* * * * rec = clean(rec)
* * * * details = {}
* * * *试试:
* * * * * * * *表示rec.split中的行(\ n):
* * * * * * * * * * * * bits = line.split('':'')
* * * * * * * * * * * * a = bits.pop(0)
* * * * * * * * * * * * b =位.pop(0)
* * * * * * * * * * * * d etails [a.strip()] = b.strip()。replace(" \t",",")
* * * *除外:
* * * * * * * *继续
* * * *记录(域名,详情)
##清理
src.close()
dest.close()
离开工作之前,我只需要做几件事。
#!/ usr / bin / env python
"""打开包含域列表的文件(每行1个),
请求并解析它的whois记录并推送到csv
文件。
""" #而不是使用docstrings而不是多行注释。
def trim(txt):
x = []
for line in txt.splitlines():#字符串有内置函数
如果不是line.strip()或line.startswith(''WHOIS'')\
或line.startswith(''>>>'')或line.startswith(''%''):
继续#你可以在一个if语句中完成它们
if line.startswith('' - ''):return''''。join(x)
x.append(''''+ line)
返回''\ n''。在src中为域名加入(x)
:
如果不是domain.strip():继续#没有任何内容的行是假的
rec = subprocess.Popen([" whois",domain.strip()],
stdout = subprocess。 PIPE).communicate()[0]
如果rec.startswith(''没有whois服务器'')\
或rec.startswith(''此TLD有没有whois服务器''):
继续#Startswith将返回真/假所以它足够
rec = trim(rec)
如果domain.endswith(''。net''):
rec = clean_net(rec)
elif domain.endswith(''。com''):
#而不是使用if / elif语句,除非你以某种方式思考你
将匹配多个。
....
for rec.splitlines():
试试:
a,b = line.split('':'')[:2]
详情[a.strip()] = b.strip()。replace(''\ t'','','')
除了IndexError之外的
:#No match
continue
希望这是一个开始。
I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.
I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!
#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.
import subprocess
import re
src = open(''./domains.txt'')
dest = open(''./whois.csv'', ''w'');
sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]
dest.write(sep.join(headers)+"\n")
def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)
x.append(" "+line)
return "\n".join(x)
def clean( txt ):
x = []
isok = re.compile("^\s?([^:]+): ").match
for line in txt.split("\n"):
match = isok(line)
if not match:
continue
x.append(line)
return "\n".join(x);
def clean_co_uk( rec ):
rec = rec.replace(''Company number:'', ''Company number -'')
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
rec = re.sub("^[ ]+\n", "", rec)
return rec
def clean_net( rec ):
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
return rec
def clean_info( rec ):
x = []
for line in rec.split("\n"):
x.append(re.sub("^([^:]+):", "\g<0", line))
return "\n".join(x)
def record(domain, record):
details = ['''','''','''','''','''','''','''','''','''']
for k, v in record.items():
try:
details[0] = domain.lower()
result = {
"registrant": lambda: 1,
"registrant name": lambda: 1,
"registrant type": lambda: 4,
"registrant''s address": lambda: 2,
"registrant address1": lambda: 2,
"registrar": lambda: 3,
"sponsoring registrar": lambda: 3,
"registered on": lambda: 5,
"registered": lambda: 5,
"domain registeration date": lambda: 5,
"renewal date": lambda: 6,
"last updated": lambda: 7,
"domain last updated date": lambda: 7,
"name servers": lambda: 8,
"name server": lambda: 8,
"nameservers": lambda: 8,
"updated date": lambda: 7,
"creation date": lambda: 5,
"expiration date": lambda: 6,
"domain expiration date": lambda: 6,
"administrative contact": lambda: 2
}[k.lower()]()
if v != '''':
details[result] = v
except:
continue
dest.write(sep.join(details)+"\n")
## Loop through domains
for domain in src:
domain = domain.strip()
if domain == '''':
continue
rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]
if rec.startswith("No whois server") == True:
continue
if rec.startswith("This TLD has no whois server") == True:
continue
rec = trim(rec)
if domain.endswith(".net"):
rec = clean_net(rec)
if domain.endswith(".com"):
rec = clean_net(rec)
if domain.endswith(".tv"):
rec = clean_net(rec)
if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)
if domain.endswith(".info"):
rec = clean_info(rec)
rec = clean(rec)
details = {}
try:
for line in rec.split("\n"):
bits = line.split('': '')
a = bits.pop(0)
b = bits.pop(0)
details[a.strip()] = b.strip().replace("\t", ", ")
except:
continue
record(domain, details)
## Cleanup
src.close()
dest.close()
"Phillip B Oldham" <ph************@gmail.comwrote in message
news:7e**********************************@26g2000h sk.googlegroups.com...I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!I''m not expert, but here are a few thoughts. I hope they help.
#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.
dest = open(''./whois.csv'', ''w'');Semicolon!!!! :)
def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:
if line.strip().startswith(''WHOIS''): continue
although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.
Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.
return "\n".join(x);Semicolon!!!! :) :)
details = ['''','''','''','''','''','''','''','''','''']I don''t have Python available to me right now, but I think you can do this
instead:
details = [''''] * 9
except:
continueNon-specific except clauses usually aren''t preferred since they catch
everything, even something you might not want to catch.
if domain == '''':
continueYou can say:
if not domain
instead of that equivalence test. But what does this if statement do?
if rec.startswith("No whois server") == True:
continue
if rec.startswith("This TLD has no whois server") == True:
continueLike above, you don''t need "== True" here.
if domain.endswith(".net"):
rec = clean_net(rec)
if domain.endswith(".com"):
rec = clean_net(rec)
if domain.endswith(".tv"):
rec = clean_net(rec)
if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)
if domain.endswith(".info"):
rec = clean_info(rec)Hmm, my first thought is to do something like this with all these if tests:
for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)
But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.
Anyway, I hope some of that helps!
"John Salerno" <jo******@NOSPAMgmail.comwrote in message
news:48**********************@news.astraweb.com...>if domain.endswith(".net"):
rec = clean_net(rec)
if domain.endswith(".com"):
rec = clean_net(rec)
if domain.endswith(".tv"):
rec = clean_net(rec)
if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)
if domain.endswith(".info"):
rec = clean_info(rec)
Hmm, my first thought is to do something like this with all these if
tests:
for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)Whoops, you''d still need an if test in there I suppose!
for extension in [<list all the extensions as strings here>]:
if domain.endswith(extension):
rec = clean_net(extension)
Not sure if this is ideal.
On Jun 12, 4:27*pm, Phillip B Oldham <phillip.old...@gmail.comwrote:I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.
I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!
#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.
import subprocess
import re
src = open(''./domains.txt'')
dest = open(''./whois.csv'', ''w'');
sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]
dest.write(sep.join(headers)+"\n")
def trim( txt ):
* * * * x = []
* * * * for line in txt.split("\n"):
* * * * * * * * if line.strip() == "":
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''WHOIS''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''>>>''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''%''):
* * * * * * * * * * * * continue
* * * * * * * * if line.startswith("--"):
* * * * * * * * * * * * return ''''.join(x)
* * * * * * * * x.append(" "+line)
* * * * return "\n".join(x)
def clean( txt ):
* * * * x = []
* * * * isok = re.compile("^\s?([^:]+): ").match
* * * * for line in txt.split("\n"):
* * * * * * * * match = isok(line)
* * * * * * * * if not match:
* * * * * * * * * * * * continue
* * * * * * * * x.append(line)
* * * * return "\n".join(x);
def clean_co_uk( rec ):
* * * * rec = rec.replace(''Company number:'', ''Company number -'')
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * rec = re.sub("^[ ]+\n", "", rec)
* * * * return rec
def clean_net( rec ):
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * return rec
def clean_info( rec ):
* * * * x = []
* * * * for line in rec.split("\n"):
* * * * * * * * x.append(re.sub("^([^:]+):", "\g<0", line))
* * * * return "\n".join(x)
def record(domain, record):
* * * * details = ['''','''','''','''','''','''','''','''','''']
* * * * for k, v in record.items():
* * * * * * * * try:
* * * * * * * * * * * * details[0] = domain.lower()
* * * * * * * * * * * * result = {
* * * * * * * * * * * * * * * * "registrant": lambda: 1,
* * * * * * * * * * * * * * * * "registrant name": lambda: 1,
* * * * * * * * * * * * * * * * "registrant type": lambda: 4,
* * * * * * * * * * * * * * * * "registrant''s address": lambda: 2,
* * * * * * * * * * * * * * * * "registrant address1": lambda: 2,
* * * * * * * * * * * * * * * * "registrar": lambda: 3,
* * * * * * * * * * * * * * * * "sponsoring registrar": lambda: 3,
* * * * * * * * * * * * * * * * "registered on": lambda: 5,
* * * * * * * * * * * * * * * * "registered": lambda: 5,
* * * * * * * * * * * * * * * * "domain registeration date": lambda: 5,
* * * * * * * * * * * * * * * * "renewal date": lambda: 6,
* * * * * * * * * * * * * * * * "last updated": lambda: 7,
* * * * * * * * * * * * * * * * "domain last updated date": lambda: 7,
* * * * * * * * * * * * * * * * "name servers": lambda: 8,
* * * * * * * * * * * * * * * * "name server": lambda: 8,
* * * * * * * * * * * * * * * * "nameservers": lambda: 8,
* * * * * * * * * * * * * * * * "updated date": lambda: 7,
* * * * * * * * * * * * * * * * "creation date": lambda: 5,
* * * * * * * * * * * * * * * * "expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "domain expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "administrative contact": lambda: 2
* * * * * * * * * * * * }[k.lower()]()
* * * * * * * * * * * * if v != '''':
* * * * * * * * * * * * * * * * details[result] = v
* * * * * * * * except:
* * * * * * * * * * * * continue
* * * * dest.write(sep.join(details)+"\n")
## Loop through domains
for domain in src:
* * * * domain = domain.strip()
* * * * if domain == '''':
* * * * * * * * continue
* * * * rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]
* * * * if rec.startswith("No whois server") == True:
* * * * * * * * continue
* * * * if rec.startswith("This TLD has no whois server") == True:
* * * * * * * * continue
* * * * rec = trim(rec)
* * * * if domain.endswith(".net"):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(".com"):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(".tv"):
* * * * * * * * rec = clean_net(rec)
* * * * if domain.endswith(".co.uk"):
* * * * * * * * rec = clean_co_uk(rec)
* * * * if domain.endswith(".info"):
* * * * * * * * rec = clean_info(rec)
* * * * rec = clean(rec)
* * * * details = {}
* * * * try:
* * * * * * * * for line in rec.split("\n"):
* * * * * * * * * * * * bits = line.split('': '')
* * * * * * * * * * * * a = bits.pop(0)
* * * * * * * * * * * * b = bits.pop(0)
* * * * * * * * * * * * details[a.strip()] = b.strip().replace("\t", ", ")
* * * * except:
* * * * * * * * continue
* * * * record(domain, details)
## Cleanup
src.close()
dest.close()Just a few quick things before I leave work.
#!/usr/bin/env python
"""Open a file containing a list of domains (1 per line),
request and parse it''s whois record and push to a csv
file.
""" # Rather use docstrings than multiline commenting like that.
def trim(txt):
x = []
for line in txt.splitlines(): # Strings have a built in function
if not line.strip() or line.startswith(''WHOIS'') \
or line.startswith(''>>>'') or line.startswith(''%''):
continue # you can do them in one if statement
if line.startswith(''--''): return ''''.join(x)
x.append('' ''+line)
return ''\n''.join(x)
for domain in src:
if not domain.strip(): continue # A line with nothing is False
rec = subprocess.Popen(["whois",domain.strip()],
stdout=subprocess.PIPE).communicate()[0]
if rec.startswith(''No whois server'') \
or rec.startswith(''This TLD has no whois server''):
continue # Startswith will return True/False so it is enough
rec = trim(rec)
if domain.endswith(''.net''):
rec = clean_net(rec)
elif domain.endswith(''.com''):
# Rather use if/elif statements unless somehow you think you
will match more than one.
....
for line in rec.splitlines():
try:
a, b = line.split('': '')[:2]
details[a.strip()] = b.strip().replace(''\t'', '', '')
except IndexError: # No matches
continue
Hope that''s a start.
这篇关于评论我的第一个脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!