评论我的第一个脚本? [英] Comments on my first script?

查看:64
本文介绍了评论我的第一个脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我热衷于学习python,并且非常注重做事。

" pythonic"因此,所以在几个小时内将以下脚本放在一起

作为编程python的第一次尝试。


我想要社区''关于我做了什么的想法/评论;

我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论!


#!/ usr / bin / env python

##打开包含域列表的文件(每行1个) ,

##请求并解析它的whois记录并推送到csv

##文件。


import子流程

导入重新


src = open(''./ domains.txt'')


dest = open(''./ whois.csv'',''w'');


sep =" |"

headers = [ 域名,注册人,注册人

地址",注册商,注册人类型,注册日期,续订等>
Date",Last Updated,Name Servers]


dest.write(sep.join(headers)+" \ n")


def trim(txt):

x = []

for txt.split(" \ n" ):

如果line.strip()=="":

继续

if line.strip()。sta rtswith(''WHOIS'):

继续

如果line.strip()。startswith(''>>>''):

继续

如果line.strip()。startswith(''%''):

继续

如果是行。 startswith( - ):

return''''。join(x)

x.append(" " + line)

返回" \ n" .join(x)


x = []

isok = re.compile(" ^ \s?([^:] +):")。匹配

for line in txt .split(" \ n"):

match = isok(line)

如果不匹配:

继续

x.append(行)

返回" \ n" .join(x);


def clean_co_uk(rec):

rec = rec.replace(''公司编号:'',''公司编号 - '')

rec = rec.replace(" \ n \ n"," \ n")

rec = rec.replace(" \ n","")

rec = rec.replace( ":",":\ n")

rec = re.sub("([^(] [a-zA-Z''] + \??[ a-zA-Z] *:\ n)"," \ n\ g< 0>",rec)

rec = rec.replace(":\ n",":")

rec = re.sub(" ^ [] + \ n","",rec)

返回n rec


def clean_net(rec):

rec = rec.replace(" \ n\ n"," \ n" )

rec = rec.replace(" \ n","")

rec = rec.replace(":",": \ n")

rec = re.sub("([a-zA-Z''] + \??[a-zA-Z] *:\ n)" ;," \ n \ g< 0>",rec)

rec = rec.replace(":\ n",":")

返回rec


def clean_info(rec):

x = []

for rec.split中的行( " \ n"):

x.append(re.sub(" ^([^:] +):"," \g< 0",line))

返回" \ n" .join(x)


def记录(域名,记录):

details = [''','''''''''''''''''''''''''''''''''''''''''''
for k,v in record.items():

try:

details [0] = domain.lower()

result = {

" registrant":lambda:1,

" registrant name":lambda:1,

" registrant type" ;:lambda:4,

" registrant'的地址" ;: lambda:2,

" registrant address1":lambda:2,

注册商:lambda:3,

" sponsoring registrar":lambda:3,

"注册于&:; lambda:5,

" registered":lambda:5,

" domain registeration date":lambda:5,

" renewal date":lambda: 6,

最后更新:lambda:7,

域名最后更新日期:lambda:7,

"名称服务器":lambda:8,

" name server" ;:lambda:8,

" nameservers":lambda:8,

更新日期:lambda:7,

创建日期:lambda:5,

"到期日期" ;:lambda:6,
domai n到期日&:lambda:6,

" administrative contact":lambda:2

} [k.lower()]()

如果v!='''':

详情[结果] = v

除了:

继续


dest.write(sep.join(details)+" \ n")


##循环通过域名

for src中的域名:


domain = domain.strip()


if domain =='''':

继续


rec = subprocess.Popen([" whois",domain],

stdout = subprocess.PIPE).communicate()[0 ]


如果rec.startswith(没有whois服务器)==真:

继续


如果rec.startswith(此TLD没有whois服务器)==真:

继续


rec = trim(rec)


如果domain.endswith(" .net"):

rec = clean_net(rec)


if domain.endswith (" .com"):

rec = clean_net(rec)


如果domain.endswith(" .tv"):

rec = clean_net(rec)


if domain.endswith(" .co.uk"):

rec = clean_co_uk(rec)


if domain.endswith(" .info" ):

rec = clean_info(rec)


rec = clean(rec)


details = {}


尝试:

for rec.split(" \ n"):

bits = line.split ('':'')

a = bits.pop(0)

b = bits.pop(0)

details [a.strip ()] = b.strip()。replace(" \t",",")

除了:

继续


记录(域名,详情)


##清理

src.close()

dest。 close()

解决方案

" Phillip B Oldham" < ph ************ @ gmail.com写信息

新闻:7e ****************** **************** @ 26g2000h sk.googlegroups.com ...


我想要社区'关于我做了什么的想法/评论;

我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论!



我不是专家,但这里有一些想法。我希望他们能帮忙。


#!/ usr / bin / env python

##打开一个包含域列表的文件(1每行),

##请求并解析它的whois记录并推送到csv

##文件。



您可能希望将doc字符串作为一种方法来提供更长的

文档,就像你的程序所做的一样。


dest = open(''。/ whois.csv'',''w'');



分号!!!! :)


def trim(txt):

x = []

for line in txt.split( " \ n"):

如果line.strip()=="":

继续

if line.strip ().startswith(''WHOIS''):

继续

如果line.strip()。startswith(''>>>''):

继续

如果line.strip()。startswith(''%''):

继续

if line.startswith(" - "):

return''''。join(x)



这就是全部适当缩进?你可以做的一件事就是将每一行放在

一行上,因为它们非常简单:


如果是line.strip()。startswith(' 'WHOIS''):继续


虽然我仍然喜欢适当的缩进。但是你有很多这样的东西呢

这样可以节省大量的空间。


另外,只是我的个人喜好,我喜欢与我用于字符串的

引号的类型一致。在这里,你在

不同的行上混合使用单引号和双引号。


return" \ n" .join(x);



分号!!!! :) :)


details = ['''','''','''','''','''','' '',''','''','''''



我现在没有Python可用,但我认为你可以这样做

代替:


details = [''''] * 9


除外:

继续



非特定除非条款通常不是首选,因为它们会抓住

一切,甚至是你可能不想捕捉的东西。


if domain =='''':

继续



您可以说:


如果不是域名


而不是等价测试。但这个if语句是做什么的?


如果rec.startswith(没有whois服务器)== True:

continue <如果rec.startswith(此TLD没有whois服务器),则
; = = True:

继续



如上所述,你不需要== True。这里。


if domain.endswith(" .net"):

rec = clean_net(rec)


如果domain.endswith(&。com):

rec = clean_net(rec)


if domain.endswith(" ; .tv"):

rec = clean_net(rec)


if domain.endswith(" .co.uk"):
如果domain.endswith(&。info):
rec = clean_info $ b



嗯,我的第一个想法就是做这样的事情,所有这些如果测试:


扩展名为[< list所有扩展名为字符串,这里>]:

rec = clean_net(扩展名)


但为了实现这一点,您可能需要概括clean_net函数,以便

它适用于所有人,而不是必须调用不同的功能

,具体取决于扩展名。


无论如何,我希望其中一些有用!


< blockquote>" John Salerno" < jo ****** @ NOSPAMgmail.com写在留言中

news:48 ********************** @ news .astraweb.com ...


> if domain.endswith(" .net"):
rec = clean_net(rec)

如果domain.endswith(" .com"):
rec = clean_net(rec)

如果domain.endswith(") .tv"):
rec = clean_net(rec)

如果domain.endswith(" .co.uk"):
rec = clean_co_uk(rec)

如果domain.endswith(" .info"):
rec = clean_info(rec)



嗯,我的第一个想法是如果

测试那么做这样的事情:


扩展名为[<将所有扩展名列为字符串在这里>]:

rec = clean_net(扩展名)



哎呀,我猜你还需要if测试!


用于扩展[<将所有扩展名列为字符串此处>]:

如果dom ain.endswith(扩展名):

rec = clean_net(扩展名)


不确定这是否理想。


6月12日下午4:27 *,Phillip B Oldham< phillip.old ... @ gmail.comwrote:


我''我热衷于学习python,并且非常注重做事。

" pythonic"因此,所以在几个小时内将以下脚本放在一起

作为编程python的第一次尝试。


我想要社区''关于我做了什么的想法/评论;

我可以改进,不要我应该避免,等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论!


#!/ usr / bin / env python

##打开包含域列表的文件(每行1个) ,

##请求并解析它的whois记录并推送到csv

##文件。


import子流程

导入重新


src = open(''./ domains.txt'')


dest = open(''./ whois.csv'',''w'');


sep =" |"

headers = [ 域名,注册人,注册人

地址",注册商,注册人类型,注册日期,续订等>
Date",Last Updated,Name Servers]


dest.write(sep.join(headers)+" \ n")


def trim(txt):

* * * * x = []

* * * * for txt in line .split(" \ n"):

* * * * * * * * if line.strip()=="":

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip()。startswith(''WHOIS''):

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip()。startswith(''>>>''):

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip()。startswith(''%''):<如果line.startswith(& - ,"):

* * * * * * * * * * * *返回''''。join(x)

* * * * * * * * x.append(" ; " + line)

* * * * return" \ n" .join(x)


def clean(txt):

* * * * x = []

* * * * isok = re.compile(" ^ \s?([^:] +):")。匹配

* * * *表示txt.split中的行(&\ n"):

* * * * * * * * match = isok(line)

* * * * * * * *如果不匹配:

* * * * * * * * * * * *继续

* * * * * * * * x.append(行)

* * * *返回" \ n" .join(x);


def clean_co_uk(rec):

* * * * rec = rec.replace(''公司编号:'',''公司编号 - '')

* * * * rec = rec.replace(" \ n\ n"," \ n")

* * * * rec = rec.replace(" \ n", "")

* * * * rec = rec.replace(":",":\ n")

* * * * rec = re.sub("([^(] [a-zA-Z''] + \ s?[a-zA-Z] *:\ n)"," \ n\ g< ; 0>",rec)

* * * * rec = rec.replace(":\ n",":")

* * * * rec = re.sub(" ^ [] + \ n","",rec)

* * * * return rec


def clean_net(rec):

* * * * rec = rec.replace(" \ n\ n"," \ n")

* * * * rec = rec.replace (" \ n","")

* * * * rec = rec.replace(":",":\ n")

* * * * rec = re.sub("([a-zA-Z''] + \??[a-zA-Z] *:\ n)"," \ n \g< 0>",rec)

* * * * rec = rec.replace(":\ n",":")

* * * *返回rec


def clean_info(rec):

* * * * x = []

* * * *表示rec.split中的行(" \ n"):

* * * * * * * * x.append(re.sub(" ^([^: ] +):"," \g< 0",line))

* * * * return" \ n" .join(x)

>
def记录(dom ain,record):

* * * * details = ['''','''','''','''','''','''', '''','''',''''

* * * *代表k,v代表record.items():

* * * * * * * *尝试:

* * * * * * * * * * * *详情[0] = domain.lower()

* * * * * * * * * * * *结果= {

* * * * * * * * * * * * * * * *" registrant":lambda:1,

* * * * * * * * * * * * * * * *"注册人名称::lambda:1,

* * * * * * * * * * * * * * * * 注册人类型:lambda:4,

* * * * * * * * * * * * * * * *"注册人'的地址":lambda:2,
* * * * * * * * * * * * * * * *" registrant address1":lambda:2,

* * * * * * * * * * * * * * * *注册商:lambda:3,

* * * * * * * * * * * * * * * *赞助注册商:lambda:3,

* * * * * * * * * * * * * * * *注册o n":lambda:5,

* * * * * * * * * * * * * * * *"注册":lambda:5,

* * * * * * * * * * * * * * * *域名注册日期:lambda:5,

* * * * * * * * * * * * * * * *" ;续约日期:lambda:6,

* * * * * * * * * * * * * * * *最后更新:lambda:7,

* * * * * * * * * * * * * * * *域名最后更新日期:lambda:7,

* * * * * * * * * * * * * * * *名称服务器:lambda:8,

* * * * * * * * * * * * * * * *" name server" ;:lambda:8,
* * * * * * * * * * * * * * * *" nameservers" ;:lambda:8,

* * * * * * * * * * * * * * * *更新日期:lambda:7,

* * * * * * * * * * * * * * * *创建日期:lambda:5,

* * * * * * * * * * * * * * * *到期日期:lambda:6,

* * * * * * * * * * * * * * * *域名到期日期和现状t;:lambda:6,

* * * * * * * * * * * * * * * *" administrative contact":lambda:2

* * * * * * * * * * * *} [k.lower()]()

* * * * * * * * * * * *如果v!='''':

* * * * * * * * * * * * * * * *详情[结果] = v

* * * * * * * *除外:

* * * * * * * * * * * *继续


* * * * dest.write(sep.join(details)+" \ n" ;)


##循环通过域名

for src中的域名:


* * * * domain = domain.strip()


* * * *如果domain =='''':

* * * * * * * * continue


* * * * rec = subprocess.Popen([" whois",domain],

stdout = subprocess.PIPE).communicate()[0] <如果rec.startswith(No whois server)== True:

* * * * * * * />

* * * *如果rec.startswith(此TLD没有whois服务器)== True:

* * * * * * * *继续


* * * * rec = trim(rec)


* * * * if domain.endswith(" .net" ):

* * * * * * * * rec = clean_net(rec)


* * * * if domain.endswith(" .com" ):

* * * * * * * * rec = clean_net(rec)


* * * * if domain.endswith(" .tv" ):

* * * * * * * * rec = clean_net(rec)


* * * * if domain.endswith(&。co。英国):

* * * * * * * * rec = clean_co_uk(rec)


* * * * if domain.endswith("。 info"):

* * * * * * * * rec = clean_info(rec)


* * * * rec = clean(rec)


* * * * details = {}


* * * *试试:

* * * * * * * *表示rec.split中的行(\ n):

* * * * * * * * * * * * bits = line.split('':'')

* * * * * * * * * * * * a = bits.pop(0)

* * * * * * * * * * * * b =位.pop(0)

* * * * * * * * * * * * d etails [a.strip()] = b.strip()。replace(" \t",",")

* * * *除外:

* * * * * * * *继续


* * * *记录(域名,详情)


##清理

src.close()

dest.close()



离开工作之前,我只需要做几件事。


#!/ usr / bin / env python

"""打开包含域列表的文件(每行1个),

请求并解析它的whois记录并推送到csv

文件。

""" #而不是使用docstrings而不是多行注释。


def trim(txt):

x = []

for line in txt.splitlines():#字符串有内置函数

如果不是line.strip()或line.startswith(''WHOIS'')\

或line.startswith(''>>>'')或line.startswith(''%''):

继续#你可以在一个if语句中完成它们

if line.startswith('' - ''):return''''。join(x)

x.append(''''+ line)

返回''\ n''。在src中为域名加入(x)




如果不是domain.strip():继续#没有任何内容的行是假的

rec = subprocess.Popen([" whois",domain.strip()],

stdout = subprocess。 PIPE).communicate()[0]

如果rec.startswith(''没有whois服务器'')\

或rec.startswith(''此TLD有没有whois服务器''):

继续#Startswith将返回真/假所以它足够


rec = trim(rec)

如果domain.endswith(''。net''):

rec = clean_net(rec)

elif domain.endswith(''。com''):

#而不是使用if / elif语句,除非你以某种方式思考你

将匹配多个。

....


for rec.splitlines():

试试:

a,b = line.split('':'')[:2]

详情[a.strip()] = b.strip()。replace(''\ t'','','')
除了IndexError之外的
:#No match

continue


希望这是一个开始。


I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.

import subprocess
import re

src = open(''./domains.txt'')

dest = open(''./whois.csv'', ''w'');

sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]

dest.write(sep.join(headers)+"\n")

def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)
x.append(" "+line)
return "\n".join(x)

def clean( txt ):
x = []
isok = re.compile("^\s?([^:]+): ").match
for line in txt.split("\n"):
match = isok(line)
if not match:
continue
x.append(line)
return "\n".join(x);

def clean_co_uk( rec ):
rec = rec.replace(''Company number:'', ''Company number -'')
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
rec = re.sub("^[ ]+\n", "", rec)
return rec

def clean_net( rec ):
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
return rec

def clean_info( rec ):
x = []
for line in rec.split("\n"):
x.append(re.sub("^([^:]+):", "\g<0", line))
return "\n".join(x)

def record(domain, record):
details = ['''','''','''','''','''','''','''','''','''']
for k, v in record.items():
try:
details[0] = domain.lower()
result = {
"registrant": lambda: 1,
"registrant name": lambda: 1,
"registrant type": lambda: 4,
"registrant''s address": lambda: 2,
"registrant address1": lambda: 2,
"registrar": lambda: 3,
"sponsoring registrar": lambda: 3,
"registered on": lambda: 5,
"registered": lambda: 5,
"domain registeration date": lambda: 5,
"renewal date": lambda: 6,
"last updated": lambda: 7,
"domain last updated date": lambda: 7,
"name servers": lambda: 8,
"name server": lambda: 8,
"nameservers": lambda: 8,
"updated date": lambda: 7,
"creation date": lambda: 5,
"expiration date": lambda: 6,
"domain expiration date": lambda: 6,
"administrative contact": lambda: 2
}[k.lower()]()
if v != '''':
details[result] = v
except:
continue

dest.write(sep.join(details)+"\n")

## Loop through domains
for domain in src:

domain = domain.strip()

if domain == '''':
continue

rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]

if rec.startswith("No whois server") == True:
continue

if rec.startswith("This TLD has no whois server") == True:
continue

rec = trim(rec)

if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)

rec = clean(rec)

details = {}

try:
for line in rec.split("\n"):
bits = line.split('': '')
a = bits.pop(0)
b = bits.pop(0)
details[a.strip()] = b.strip().replace("\t", ", ")
except:
continue

record(domain, details)

## Cleanup
src.close()
dest.close()

解决方案

"Phillip B Oldham" <ph************@gmail.comwrote in message
news:7e**********************************@26g2000h sk.googlegroups.com...

I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

I''m not expert, but here are a few thoughts. I hope they help.

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.

You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.

dest = open(''./whois.csv'', ''w'');

Semicolon!!!! :)

def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)

Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:

if line.strip().startswith(''WHOIS''): continue

although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.

Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.

return "\n".join(x);

Semicolon!!!! :) :)

details = ['''','''','''','''','''','''','''','''','''']

I don''t have Python available to me right now, but I think you can do this
instead:

details = [''''] * 9

except:
continue

Non-specific except clauses usually aren''t preferred since they catch
everything, even something you might not want to catch.

if domain == '''':
continue

You can say:

if not domain

instead of that equivalence test. But what does this if statement do?

if rec.startswith("No whois server") == True:
continue

if rec.startswith("This TLD has no whois server") == True:
continue

Like above, you don''t need "== True" here.

if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)

But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.

Anyway, I hope some of that helps!


"John Salerno" <jo******@NOSPAMgmail.comwrote in message
news:48**********************@news.astraweb.com...

>if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)


Hmm, my first thought is to do something like this with all these if
tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)

Whoops, you''d still need an if test in there I suppose!

for extension in [<list all the extensions as strings here>]:
if domain.endswith(extension):
rec = clean_net(extension)

Not sure if this is ideal.


On Jun 12, 4:27*pm, Phillip B Oldham <phillip.old...@gmail.comwrote:

I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.

import subprocess
import re

src = open(''./domains.txt'')

dest = open(''./whois.csv'', ''w'');

sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]

dest.write(sep.join(headers)+"\n")

def trim( txt ):
* * * * x = []
* * * * for line in txt.split("\n"):
* * * * * * * * if line.strip() == "":
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''WHOIS''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''>>>''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''%''):
* * * * * * * * * * * * continue
* * * * * * * * if line.startswith("--"):
* * * * * * * * * * * * return ''''.join(x)
* * * * * * * * x.append(" "+line)
* * * * return "\n".join(x)

def clean( txt ):
* * * * x = []
* * * * isok = re.compile("^\s?([^:]+): ").match
* * * * for line in txt.split("\n"):
* * * * * * * * match = isok(line)
* * * * * * * * if not match:
* * * * * * * * * * * * continue
* * * * * * * * x.append(line)
* * * * return "\n".join(x);

def clean_co_uk( rec ):
* * * * rec = rec.replace(''Company number:'', ''Company number -'')
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * rec = re.sub("^[ ]+\n", "", rec)
* * * * return rec

def clean_net( rec ):
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * return rec

def clean_info( rec ):
* * * * x = []
* * * * for line in rec.split("\n"):
* * * * * * * * x.append(re.sub("^([^:]+):", "\g<0", line))
* * * * return "\n".join(x)

def record(domain, record):
* * * * details = ['''','''','''','''','''','''','''','''','''']
* * * * for k, v in record.items():
* * * * * * * * try:
* * * * * * * * * * * * details[0] = domain.lower()
* * * * * * * * * * * * result = {
* * * * * * * * * * * * * * * * "registrant": lambda: 1,
* * * * * * * * * * * * * * * * "registrant name": lambda: 1,
* * * * * * * * * * * * * * * * "registrant type": lambda: 4,
* * * * * * * * * * * * * * * * "registrant''s address": lambda: 2,
* * * * * * * * * * * * * * * * "registrant address1": lambda: 2,
* * * * * * * * * * * * * * * * "registrar": lambda: 3,
* * * * * * * * * * * * * * * * "sponsoring registrar": lambda: 3,
* * * * * * * * * * * * * * * * "registered on": lambda: 5,
* * * * * * * * * * * * * * * * "registered": lambda: 5,
* * * * * * * * * * * * * * * * "domain registeration date": lambda: 5,
* * * * * * * * * * * * * * * * "renewal date": lambda: 6,
* * * * * * * * * * * * * * * * "last updated": lambda: 7,
* * * * * * * * * * * * * * * * "domain last updated date": lambda: 7,
* * * * * * * * * * * * * * * * "name servers": lambda: 8,
* * * * * * * * * * * * * * * * "name server": lambda: 8,
* * * * * * * * * * * * * * * * "nameservers": lambda: 8,
* * * * * * * * * * * * * * * * "updated date": lambda: 7,
* * * * * * * * * * * * * * * * "creation date": lambda: 5,
* * * * * * * * * * * * * * * * "expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "domain expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "administrative contact": lambda: 2
* * * * * * * * * * * * }[k.lower()]()
* * * * * * * * * * * * if v != '''':
* * * * * * * * * * * * * * * * details[result] = v
* * * * * * * * except:
* * * * * * * * * * * * continue

* * * * dest.write(sep.join(details)+"\n")

## Loop through domains
for domain in src:

* * * * domain = domain.strip()

* * * * if domain == '''':
* * * * * * * * continue

* * * * rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]

* * * * if rec.startswith("No whois server") == True:
* * * * * * * * continue

* * * * if rec.startswith("This TLD has no whois server") == True:
* * * * * * * * continue

* * * * rec = trim(rec)

* * * * if domain.endswith(".net"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".com"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".tv"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".co.uk"):
* * * * * * * * rec = clean_co_uk(rec)

* * * * if domain.endswith(".info"):
* * * * * * * * rec = clean_info(rec)

* * * * rec = clean(rec)

* * * * details = {}

* * * * try:
* * * * * * * * for line in rec.split("\n"):
* * * * * * * * * * * * bits = line.split('': '')
* * * * * * * * * * * * a = bits.pop(0)
* * * * * * * * * * * * b = bits.pop(0)
* * * * * * * * * * * * details[a.strip()] = b.strip().replace("\t", ", ")
* * * * except:
* * * * * * * * continue

* * * * record(domain, details)

## Cleanup
src.close()
dest.close()

Just a few quick things before I leave work.

#!/usr/bin/env python
"""Open a file containing a list of domains (1 per line),
request and parse it''s whois record and push to a csv
file.
""" # Rather use docstrings than multiline commenting like that.

def trim(txt):
x = []
for line in txt.splitlines(): # Strings have a built in function
if not line.strip() or line.startswith(''WHOIS'') \
or line.startswith(''>>>'') or line.startswith(''%''):
continue # you can do them in one if statement
if line.startswith(''--''): return ''''.join(x)
x.append('' ''+line)
return ''\n''.join(x)

for domain in src:
if not domain.strip(): continue # A line with nothing is False

rec = subprocess.Popen(["whois",domain.strip()],
stdout=subprocess.PIPE).communicate()[0]
if rec.startswith(''No whois server'') \
or rec.startswith(''This TLD has no whois server''):
continue # Startswith will return True/False so it is enough

rec = trim(rec)
if domain.endswith(''.net''):
rec = clean_net(rec)
elif domain.endswith(''.com''):
# Rather use if/elif statements unless somehow you think you
will match more than one.
....

for line in rec.splitlines():
try:
a, b = line.split('': '')[:2]
details[a.strip()] = b.strip().replace(''\t'', '', '')
except IndexError: # No matches
continue

Hope that''s a start.


这篇关于评论我的第一个脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆