评论我的第一个脚本？ [英] Comments on my first script?

查看：64 发布时间：2019/6/5 16:45:37 python

本文介绍了评论我的第一个脚本？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我热衷于学习python，并且非常注重做事。

" pythonic"因此，所以在几个小时内将以下脚本放在一起

作为编程python的第一次尝试。

我想要社区''关于我做了什么的想法/评论;

我可以改进，不要我应该避免，等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论！

＃！/ usr / bin / env python

##打开包含域列表的文件（每行1个），

##请求并解析它的whois记录并推送到csv

##文件。

import子流程

导入重新

src = open（''./ domains.txt''）

dest = open（''./ whois.csv''，''w''）;

sep =" |"

headers = [ 域名，注册人，注册人

地址"，注册商，注册人类型，注册日期，续订等>
Date"，Last Updated，Name Servers]

dest.write（sep.join（headers）+" \ n"）

def trim（txt）：

x = []

for txt.split（" \ n" ）：

如果line.strip（）==""：

继续

if line.strip（）。sta rtswith（''WHOIS'）：

继续

如果line.strip（）。startswith（''>>>''）：

继续

如果line.strip（）。startswith（''％''）：

继续

如果是行。 startswith（ - ）：

return''''。join（x）

x.append（" " + line）

返回" \ n" .join（x）

x = []

isok = re.compile（" ^ \s？（[^：] +）："）。匹配

for line in txt .split（" \ n"）：

match = isok（line）

如果不匹配：

继续

x.append（行）

返回" \ n" .join（x）;

def clean_co_uk（rec）：

rec = rec.replace（''公司编号：''，''公司编号 - ''）

rec = rec.replace（" \ n \ n"，" \ n"）

rec = rec.replace（" \ n"，""）

rec = rec.replace（ "："，"：\ n"）

rec = re.sub（"（[^（] [a-zA-Z''] + \？？[ a-zA-Z] *：\ n）"，" \ n\ g< 0>"，rec）

rec = rec.replace（"：\ n"，"："）

rec = re.sub（" ^ [] + \ n"，""，rec）

返回n rec

def clean_net（rec）：

rec = rec.replace（" \ n\ n"，" \ n" ）

rec = rec.replace（" \ n"，""）

rec = rec.replace（"："，"： \ n"）

rec = re.sub（"（[a-zA-Z''] + \？？[a-zA-Z] *：\ n）" ;，" \ n \ g< 0>"，rec）

rec = rec.replace（"：\ n"，"："）

返回rec

def clean_info（rec）：

x = []

for rec.split中的行（ " \ n"）：

x.append（re.sub（" ^（[^：] +）："，" \g< 0"，line））

返回" \ n" .join（x）

def记录（域名，记录）：

details = ['''，'''''''''''''''''''''''''''''''''''''''''''
for k，v in record.items（）：

try：

details [0] = domain.lower（）

result = {

" registrant"：lambda：1，

" registrant name"：lambda：1，

" registrant type" ;：lambda：4，

" registrant'的地址" ;: lambda：2，

" registrant address1"：lambda：2，

注册商：lambda：3，

" sponsoring registrar"：lambda：3，

"注册于&:; lambda：5，

" registered"：lambda：5，

" domain registeration date"：lambda：5，

" renewal date"：lambda： 6，

最后更新：lambda：7，

域名最后更新日期：lambda：7，

"名称服务器"：lambda：8，

" name server" ;:lambda：8，

" nameservers"：lambda：8，

更新日期：lambda：7，

创建日期：lambda：5，

"到期日期" ;:lambda：6，
domai n到期日&：lambda：6，

" administrative contact"：lambda：2

} [k.lower（）]（）

如果v！=''''：

详情[结果] = v

除了：

继续

dest.write（sep.join（details）+" \ n"）

##循环通过域名

for src中的域名：

domain = domain.strip（）

if domain ==''''：

继续

rec = subprocess.Popen（[" whois"，domain]，

stdout = subprocess.PIPE）.communicate（）[0 ]

如果rec.startswith（没有whois服务器）==真：

继续

如果rec.startswith（此TLD没有whois服务器）==真：

继续

rec = trim（rec）

如果domain.endswith（" .net"）：

rec = clean_net（rec）

if domain.endswith （" .com"）：

rec = clean_net（rec）

如果domain.endswith（" .tv"）：

rec = clean_net（rec）

if domain.endswith（" .co.uk"）：

rec = clean_co_uk（rec）

if domain.endswith（" .info" ）：

rec = clean_info（rec）

rec = clean（rec）

details = {}

尝试：

for rec.split（" \ n"）：

bits = line.split （''：''）

a = bits.pop（0）

b = bits.pop（0）

details [a.strip （）] = b.strip（）。replace（" \t"，"，"）

除了：

继续

记录（域名，详情）

##清理

src.close（）

dest。 close（）

解决方案

" Phillip B Oldham" < ph ************ @ gmail.com写信息

新闻：7e ****************** **************** @ 26g2000h sk.googlegroups.com ...

我想要社区'关于我做了什么的想法/评论;

我可以改进，不要我应该避免，等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论！

我不是专家，但这里有一些想法。我希望他们能帮忙。

＃！/ usr / bin / env python

##打开一个包含域列表的文件（1每行），

##请求并解析它的whois记录并推送到csv

##文件。

您可能希望将doc字符串作为一种方法来提供更长的

文档，就像你的程序所做的一样。

dest = open（''。/ whois.csv''，''w''）;

分号!!!! :)

def trim（txt）：

x = []

for line in txt.split（ " \ n"）：

如果line.strip（）==""：

继续

if line.strip （）.startswith（''WHOIS''）：

继续

如果line.strip（）。startswith（''>>>''）：

继续

如果line.strip（）。startswith（''％''）：

继续

if line.startswith（" - "）：

return''''。join（x）

这就是全部适当缩进？你可以做的一件事就是将每一行放在

一行上，因为它们非常简单：

如果是line.strip（）。startswith（' 'WHOIS''）：继续

虽然我仍然喜欢适当的缩进。但是你有很多这样的东西呢

这样可以节省大量的空间。

另外，只是我的个人喜好，我喜欢与我用于字符串的

引号的类型一致。在这里，你在

不同的行上混合使用单引号和双引号。

return" \ n" .join（x）;

分号!!!! :) :)

details = [''''，''''，''''，''''，''''，'' ''，'''，''''，'''''

我现在没有Python可用，但我认为你可以这样做

代替：

details = [''''] * 9

除外：

继续

非特定除非条款通常不是首选，因为它们会抓住

一切，甚至是你可能不想捕捉的东西。

if domain ==''''：

继续

您可以说：

如果不是域名

而不是等价测试。但这个if语句是做什么的？

如果rec.startswith（没有whois服务器）== True：

continue <如果rec.startswith（此TLD没有whois服务器），则
; = = True：

继续

如上所述，你不需要== True。这里。

if domain.endswith（" .net"）：

rec = clean_net（rec）

如果domain.endswith（&。com）：

rec = clean_net（rec）

if domain.endswith（" ; .tv"）：

rec = clean_net（rec）

if domain.endswith（" .co.uk"）：
如果domain.endswith（&。info）：
rec = clean_info $ b

嗯，我的第一个想法就是做这样的事情，所有这些如果测试：

扩展名为[< list所有扩展名为字符串，这里>]：

rec = clean_net（扩展名）

但为了实现这一点，您可能需要概括clean_net函数，以便

它适用于所有人，而不是必须调用不同的功能

，具体取决于扩展名。

无论如何，我希望其中一些有用！

< blockquote>" John Salerno" < jo ****** @ NOSPAMgmail.com写在留言中

news：48 ********************** @ news .astraweb.com ...

> if domain.endswith（" .net"）：
rec = clean_net（rec）

如果domain.endswith（" .com"）：
rec = clean_net（rec）

如果domain.endswith（"） .tv"）：
rec = clean_net（rec）

如果domain.endswith（" .co.uk"）：
rec = clean_co_uk（rec）

如果domain.endswith（" .info"）：
rec = clean_info（rec）

嗯，我的第一个想法是如果

测试那么做这样的事情：

扩展名为[<将所有扩展名列为字符串在这里>]：

rec = clean_net（扩展名）

哎呀，我猜你还需要if测试！

用于扩展[<将所有扩展名列为字符串此处>]：

如果dom ain.endswith（扩展名）：

rec = clean_net（扩展名）

不确定这是否理想。

6月12日下午4：27 *，Phillip B Oldham< phillip.old ... @ gmail.comwrote：

我''我热衷于学习python，并且非常注重做事。

" pythonic"因此，所以在几个小时内将以下脚本放在一起

作为编程python的第一次尝试。

我想要社区''关于我做了什么的想法/评论;

我可以改进，不要我应该避免，等等。我不是因为结果数据而感到困扰 - 目前它满足了我的

需求。但欢迎任何评论！

＃！/ usr / bin / env python

##打开包含域列表的文件（每行1个），

##请求并解析它的whois记录并推送到csv

##文件。

import子流程

导入重新

src = open（''./ domains.txt''）

dest = open（''./ whois.csv''，''w''）;

sep =" |"

headers = [ 域名，注册人，注册人

地址"，注册商，注册人类型，注册日期，续订等>
Date"，Last Updated，Name Servers]

dest.write（sep.join（headers）+" \ n"）

def trim（txt）：

* * * * x = []

* * * * for txt in line .split（" \ n"）：

* * * * * * * * if line.strip（）==""：

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip（）。startswith（''WHOIS''）：

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip（）。startswith（''>>>''）：

* * * * * * * * * * * *继续

* * * * * * * *如果line.strip（）。startswith（''％''）：<如果line.startswith（& - ，"）：

* * * * * * * * * * * *返回''''。join（x）

* * * * * * * * x.append（" ; " + line）

* * * * return" \ n" .join（x）

def clean（txt）：

* * * * x = []

* * * * isok = re.compile（" ^ \s？（[^：] +）："）。匹配

* * * *表示txt.split中的行（&\ n"）：

* * * * * * * * match = isok（line）

* * * * * * * *如果不匹配：

* * * * * * * * * * * *继续

* * * * * * * * x.append（行）

* * * *返回" \ n" .join（x）;

def clean_co_uk（rec）：

* * * * rec = rec.replace（''公司编号：''，''公司编号 - ''）

* * * * rec = rec.replace（" \ n\ n"，" \ n"）

* * * * rec = rec.replace（" \ n"， ""）

* * * * rec = rec.replace（"："，"：\ n"）

* * * * rec = re.sub（"（[^（] [a-zA-Z''] + \ s？[a-zA-Z] *：\ n）"，" \ n\ g< ; 0>"，rec）

* * * * rec = rec.replace（"：\ n"，"："）

* * * * rec = re.sub（" ^ [] + \ n"，""，rec）

* * * * return rec

def clean_net（rec）：

* * * * rec = rec.replace（" \ n\ n"，" \ n"）

* * * * rec = rec.replace （" \ n"，""）

* * * * rec = rec.replace（"："，"：\ n"）

* * * * rec = re.sub（"（[a-zA-Z''] + \？？[a-zA-Z] *：\ n）"，" \ n \g< 0>"，rec）

* * * * rec = rec.replace（"：\ n"，"："）

* * * *返回rec

def clean_info（rec）：

* * * * x = []

* * * *表示rec.split中的行（" \ n"）：

* * * * * * * * x.append（re.sub（" ^（[^： ] +）："，" \g< 0"，line））

* * * * return" \ n" .join（x）

>
def记录（dom ain，record）：

* * * * details = [''''，''''，''''，''''，''''，''''， ''''，''''，''''

* * * *代表k，v代表record.items（）：

* * * * * * * *尝试：

* * * * * * * * * * * *详情[0] = domain.lower（）

* * * * * * * * * * * *结果= {

* * * * * * * * * * * * * * * *" registrant"：lambda：1，

* * * * * * * * * * * * * * * *"注册人名称：：lambda：1，

* * * * * * * * * * * * * * * * 注册人类型：lambda：4，

* * * * * * * * * * * * * * * *"注册人'的地址"：lambda：2，
* * * * * * * * * * * * * * * *" registrant address1"：lambda：2，

* * * * * * * * * * * * * * * *注册商：lambda：3，

* * * * * * * * * * * * * * * *赞助注册商：lambda：3，

* * * * * * * * * * * * * * * *注册o n"：lambda：5，

* * * * * * * * * * * * * * * *"注册"：lambda：5，

* * * * * * * * * * * * * * * *域名注册日期：lambda：5，

* * * * * * * * * * * * * * * *" ;续约日期：lambda：6，

* * * * * * * * * * * * * * * *最后更新：lambda：7，

* * * * * * * * * * * * * * * *域名最后更新日期：lambda：7，

* * * * * * * * * * * * * * * *名称服务器：lambda：8，

* * * * * * * * * * * * * * * *" name server" ;:lambda：8，
* * * * * * * * * * * * * * * *" nameservers" ;:lambda：8，

* * * * * * * * * * * * * * * *更新日期：lambda：7，

* * * * * * * * * * * * * * * *创建日期：lambda：5，

* * * * * * * * * * * * * * * *到期日期：lambda：6，

* * * * * * * * * * * * * * * *域名到期日期和现状t;：lambda：6，

* * * * * * * * * * * * * * * *" administrative contact"：lambda：2

* * * * * * * * * * * *} [k.lower（）]（）

* * * * * * * * * * * *如果v！=''''：

* * * * * * * * * * * * * * * *详情[结果] = v

* * * * * * * *除外：

* * * * * * * * * * * *继续

* * * * dest.write（sep.join（details）+" \ n" ;）

##循环通过域名

for src中的域名：

* * * * domain = domain.strip（）

* * * *如果domain ==''''：

* * * * * * * * continue

* * * * rec = subprocess.Popen（[" whois"，domain]，

stdout = subprocess.PIPE）.communicate（）[0] <如果rec.startswith（No whois server）== True：

* * * * * * * />

* * * *如果rec.startswith（此TLD没有whois服务器）== True：

* * * * * * * *继续

* * * * rec = trim（rec）

* * * * if domain.endswith（" .net" ）：

* * * * * * * * rec = clean_net（rec）

* * * * if domain.endswith（" .com" ）：

* * * * * * * * rec = clean_net（rec）

* * * * if domain.endswith（" .tv" ）：

* * * * * * * * rec = clean_net（rec）

* * * * if domain.endswith（&。co。英国）：

* * * * * * * * rec = clean_co_uk（rec）

* * * * if domain.endswith（"。 info"）：

* * * * * * * * rec = clean_info（rec）

* * * * rec = clean（rec）

* * * * details = {}

* * * *试试：

* * * * * * * *表示rec.split中的行（\ n）：

* * * * * * * * * * * * bits = line.split（''：''）

* * * * * * * * * * * * a = bits.pop（0）

* * * * * * * * * * * * b =位.pop（0）

* * * * * * * * * * * * d etails [a.strip（）] = b.strip（）。replace（" \t"，"，"）

* * * *除外：

* * * * * * * *继续

* * * *记录（域名，详情）

##清理

src.close（）

dest.close（）

离开工作之前，我只需要做几件事。

＃！/ usr / bin / env python

"""打开包含域列表的文件（每行1个），

请求并解析它的whois记录并推送到csv

文件。

""" ＃而不是使用docstrings而不是多行注释。

def trim（txt）：

x = []

for line in txt.splitlines（）：＃字符串有内置函数

如果不是line.strip（）或line.startswith（''WHOIS''）\

或line.startswith（''>>>''）或line.startswith（''％''）：

继续＃你可以在一个if语句中完成它们

if line.startswith（'' - ''）：return''''。join（x）

x.append（''''+ line）

返回''\ n''。在src中为域名加入（x）

：

如果不是domain.strip（）：继续＃没有任何内容的行是假的

rec = subprocess.Popen（[" whois"，domain.strip（）]，

stdout = subprocess。 PIPE）.communicate（）[0]

如果rec.startswith（''没有whois服务器''）\

或rec.startswith（''此TLD有没有whois服务器''）：

继续#Startswith将返回真/假所以它足够

rec = trim（rec）

如果domain.endswith（''。net''）：

rec = clean_net（rec）

elif domain.endswith（''。com''）：

＃而不是使用if / elif语句，除非你以某种方式思考你

将匹配多个。

....

for rec.splitlines（）：

试试：

a，b = line.split（''：''）[：2]

详情[a.strip（）] = b.strip（）。replace（''\ t''，''，''）
除了IndexError之外的
：#No match

continue

希望这是一个开始。

I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.

import subprocess
import re

src = open(''./domains.txt'')

dest = open(''./whois.csv'', ''w'');

sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]

dest.write(sep.join(headers)+"\n")

def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)
x.append(" "+line)
return "\n".join(x)

def clean( txt ):
x = []
isok = re.compile("^\s?([^:]+): ").match
for line in txt.split("\n"):
match = isok(line)
if not match:
continue
x.append(line)
return "\n".join(x);

def clean_co_uk( rec ):
rec = rec.replace(''Company number:'', ''Company number -'')
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
rec = re.sub("^[ ]+\n", "", rec)
return rec

def clean_net( rec ):
rec = rec.replace("\n\n", "\n")
rec = rec.replace("\n", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\n", ": ")
return rec

def clean_info( rec ):
x = []
for line in rec.split("\n"):
x.append(re.sub("^([^:]+):", "\g<0", line))
return "\n".join(x)

def record(domain, record):
details = ['''','''','''','''','''','''','''','''','''']
for k, v in record.items():
try:
details[0] = domain.lower()
result = {
"registrant": lambda: 1,
"registrant name": lambda: 1,
"registrant type": lambda: 4,
"registrant''s address": lambda: 2,
"registrant address1": lambda: 2,
"registrar": lambda: 3,
"sponsoring registrar": lambda: 3,
"registered on": lambda: 5,
"registered": lambda: 5,
"domain registeration date": lambda: 5,
"renewal date": lambda: 6,
"last updated": lambda: 7,
"domain last updated date": lambda: 7,
"name servers": lambda: 8,
"name server": lambda: 8,
"nameservers": lambda: 8,
"updated date": lambda: 7,
"creation date": lambda: 5,
"expiration date": lambda: 6,
"domain expiration date": lambda: 6,
"administrative contact": lambda: 2
}[k.lower()]()
if v != '''':
details[result] = v
except:
continue

dest.write(sep.join(details)+"\n")

## Loop through domains
for domain in src:

domain = domain.strip()

if domain == '''':
continue

rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]

if rec.startswith("No whois server") == True:
continue

if rec.startswith("This TLD has no whois server") == True:
continue

rec = trim(rec)

if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)

rec = clean(rec)

details = {}

try:
for line in rec.split("\n"):
bits = line.split('': '')
a = bits.pop(0)
b = bits.pop(0)
details[a.strip()] = b.strip().replace("\t", ", ")
except:
continue

record(domain, details)

## Cleanup
src.close()
dest.close()

解决方案

"Phillip B Oldham" <ph************@gmail.comwrote in message
news:7e**********************************@26g2000h sk.googlegroups.com...
I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!
I''m not expert, but here are a few thoughts. I hope they help.

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.
You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.

dest = open(''./whois.csv'', ''w'');
Semicolon!!!! :)

def trim( txt ):
x = []
for line in txt.split("\n"):
if line.strip() == "":
continue
if line.strip().startswith(''WHOIS''):
continue
if line.strip().startswith(''>>>''):
continue
if line.strip().startswith(''%''):
continue
if line.startswith("--"):
return ''''.join(x)
Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:

if line.strip().startswith(''WHOIS''): continue

although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.

Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.

return "\n".join(x);
Semicolon!!!! :) :)

details = ['''','''','''','''','''','''','''','''','''']
I don''t have Python available to me right now, but I think you can do this
instead:

details = [''''] * 9

except:
continue
Non-specific except clauses usually aren''t preferred since they catch
everything, even something you might not want to catch.

if domain == '''':
continue
You can say:

if not domain

instead of that equivalence test. But what does this if statement do?

if rec.startswith("No whois server") == True:
continue

if rec.startswith("This TLD has no whois server") == True:
continue
Like above, you don''t need "== True" here.

if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)
Hmm, my first thought is to do something like this with all these if tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)

But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.

Anyway, I hope some of that helps!

"John Salerno" <jo******@NOSPAMgmail.comwrote in message
news:48**********************@news.astraweb.com...

>if domain.endswith(".net"):
rec = clean_net(rec)

if domain.endswith(".com"):
rec = clean_net(rec)

if domain.endswith(".tv"):
rec = clean_net(rec)

if domain.endswith(".co.uk"):
rec = clean_co_uk(rec)

if domain.endswith(".info"):
rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if
tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(extension)
Whoops, you''d still need an if test in there I suppose!

for extension in [<list all the extensions as strings here>]:
if domain.endswith(extension):
rec = clean_net(extension)

Not sure if this is ideal.

On Jun 12, 4:27*pm, Phillip B Oldham <phillip.old...@gmail.comwrote:
I''m keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I''d like the community''s thoughts/comments on what I''ve done;
improvements I can make, "don''ts" I should be avoiding, etc. I''m not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it''s whois record and push to a csv
## file.

import subprocess
import re

src = open(''./domains.txt'')

dest = open(''./whois.csv'', ''w'');

sep = "|"
headers = ["Domain","Registrant","Registrant''s
Address","Registrar","Registrant Type","Date Registered","Renewal
Date","Last Updated","Name Servers"]

dest.write(sep.join(headers)+"\n")

def trim( txt ):
* * * * x = []
* * * * for line in txt.split("\n"):
* * * * * * * * if line.strip() == "":
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''WHOIS''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''>>>''):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().startswith(''%''):
* * * * * * * * * * * * continue
* * * * * * * * if line.startswith("--"):
* * * * * * * * * * * * return ''''.join(x)
* * * * * * * * x.append(" "+line)
* * * * return "\n".join(x)

def clean( txt ):
* * * * x = []
* * * * isok = re.compile("^\s?([^:]+): ").match
* * * * for line in txt.split("\n"):
* * * * * * * * match = isok(line)
* * * * * * * * if not match:
* * * * * * * * * * * * continue
* * * * * * * * x.append(line)
* * * * return "\n".join(x);

def clean_co_uk( rec ):
* * * * rec = rec.replace(''Company number:'', ''Company number -'')
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([^(][a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * rec = re.sub("^[ ]+\n", "", rec)
* * * * return rec

def clean_net( rec ):
* * * * rec = rec.replace("\n\n", "\n")
* * * * rec = rec.replace("\n", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([a-zA-Z'']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\n", ": ")
* * * * return rec

def clean_info( rec ):
* * * * x = []
* * * * for line in rec.split("\n"):
* * * * * * * * x.append(re.sub("^([^:]+):", "\g<0", line))
* * * * return "\n".join(x)

def record(domain, record):
* * * * details = ['''','''','''','''','''','''','''','''','''']
* * * * for k, v in record.items():
* * * * * * * * try:
* * * * * * * * * * * * details[0] = domain.lower()
* * * * * * * * * * * * result = {
* * * * * * * * * * * * * * * * "registrant": lambda: 1,
* * * * * * * * * * * * * * * * "registrant name": lambda: 1,
* * * * * * * * * * * * * * * * "registrant type": lambda: 4,
* * * * * * * * * * * * * * * * "registrant''s address": lambda: 2,
* * * * * * * * * * * * * * * * "registrant address1": lambda: 2,
* * * * * * * * * * * * * * * * "registrar": lambda: 3,
* * * * * * * * * * * * * * * * "sponsoring registrar": lambda: 3,
* * * * * * * * * * * * * * * * "registered on": lambda: 5,
* * * * * * * * * * * * * * * * "registered": lambda: 5,
* * * * * * * * * * * * * * * * "domain registeration date": lambda: 5,
* * * * * * * * * * * * * * * * "renewal date": lambda: 6,
* * * * * * * * * * * * * * * * "last updated": lambda: 7,
* * * * * * * * * * * * * * * * "domain last updated date": lambda: 7,
* * * * * * * * * * * * * * * * "name servers": lambda: 8,
* * * * * * * * * * * * * * * * "name server": lambda: 8,
* * * * * * * * * * * * * * * * "nameservers": lambda: 8,
* * * * * * * * * * * * * * * * "updated date": lambda: 7,
* * * * * * * * * * * * * * * * "creation date": lambda: 5,
* * * * * * * * * * * * * * * * "expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "domain expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "administrative contact": lambda: 2
* * * * * * * * * * * * }[k.lower()]()
* * * * * * * * * * * * if v != '''':
* * * * * * * * * * * * * * * * details[result] = v
* * * * * * * * except:
* * * * * * * * * * * * continue

* * * * dest.write(sep.join(details)+"\n")

## Loop through domains
for domain in src:

* * * * domain = domain.strip()

* * * * if domain == '''':
* * * * * * * * continue

* * * * rec = subprocess.Popen(["whois",domain],
stdout=subprocess.PIPE).communicate()[0]

* * * * if rec.startswith("No whois server") == True:
* * * * * * * * continue

* * * * if rec.startswith("This TLD has no whois server") == True:
* * * * * * * * continue

* * * * rec = trim(rec)

* * * * if domain.endswith(".net"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".com"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".tv"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith(".co.uk"):
* * * * * * * * rec = clean_co_uk(rec)

* * * * if domain.endswith(".info"):
* * * * * * * * rec = clean_info(rec)

* * * * rec = clean(rec)

* * * * details = {}

* * * * try:
* * * * * * * * for line in rec.split("\n"):
* * * * * * * * * * * * bits = line.split('': '')
* * * * * * * * * * * * a = bits.pop(0)
* * * * * * * * * * * * b = bits.pop(0)
* * * * * * * * * * * * details[a.strip()] = b.strip().replace("\t", ", ")
* * * * except:
* * * * * * * * continue

* * * * record(domain, details)

## Cleanup
src.close()
dest.close()
Just a few quick things before I leave work.

#!/usr/bin/env python
"""Open a file containing a list of domains (1 per line),
request and parse it''s whois record and push to a csv
file.
""" # Rather use docstrings than multiline commenting like that.

def trim(txt):
x = []
for line in txt.splitlines(): # Strings have a built in function
if not line.strip() or line.startswith(''WHOIS'') \
or line.startswith(''>>>'') or line.startswith(''%''):
continue # you can do them in one if statement
if line.startswith(''--''): return ''''.join(x)
x.append('' ''+line)
return ''\n''.join(x)

for domain in src:
if not domain.strip(): continue # A line with nothing is False

rec = subprocess.Popen(["whois",domain.strip()],
stdout=subprocess.PIPE).communicate()[0]
if rec.startswith(''No whois server'') \
or rec.startswith(''This TLD has no whois server''):
continue # Startswith will return True/False so it is enough

rec = trim(rec)
if domain.endswith(''.net''):
rec = clean_net(rec)
elif domain.endswith(''.com''):
# Rather use if/elif statements unless somehow you think you
will match more than one.
....

for line in rec.splitlines():
try:
a, b = line.split('': '')[:2]
details[a.strip()] = b.strip().replace(''\t'', '', '')
except IndexError: # No matches
continue

Hope that''s a start.

这篇关于评论我的第一个脚本？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

评论我的第一个脚本？ [英] Comments on my first script?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

评论我的第一个脚本？ [英] Comments on my first script?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭