CS50 DNA适用于small.csv,但不适用于Large [英] CS50 DNA works for small.csv but not for large

查看:68
本文介绍了CS50 DNA适用于small.csv,但不适用于Large的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在CS50 pset6 DNA方面遇到问题。当我使用 small.csv 文件时,它会获得所有正确的值并给出正确的答案,而当我使用大文件时,它不会给出正确的答案。我已经用debug50进行了一周以上的调试,无法解决问题。我认为问题出在示例循环中的某个地方,以找到STRS,但我只是看不到它在遍历时出了什么问题。

I am having problems with CS50 pset6 DNA. It is getting all the right values and gives correct answers when I use the small.csv file but not when I use the large one. I have been going through it with debug50 for over a week and can't figure out the problem. I assume the problem is somewhere in the loop through the samples to find the STRS but I just don't see what it is doing wrong when walking through it.

如果不熟悉CS50 DNA问题集,该代码应该通过dna序列( argv [1] )进行查找,并将其与包含人DNA STR的CSV文件进行比较以找出

If you are unfamiliar with CS50 DNA problemset, the code is supposed to look through a dna sequence (argv[1]) and compare it with a CSV file containing people DNA STRs to figure out which person (if any) it belongs to.

注意;我的代码在这种情况下失败了; (Python dna.py数据库/large.csv序列/5.txt)是否有帮助。

Note; My code fails within the case; (Python dna.py databases/large.csv sequences/5.txt) if this helps.

from sys import argv
from csv import reader


#ensures correct number of arguments
if (len(argv) != 3):
    print("usage: python dna.py data sample")


#dict for storage
peps = {}
#storage for strands we look for.
types = []

#opens csv table
with open(argv[1],'r') as file:
    data = reader(file)
    line = 0
    number = 0
    for l in data:
        if line == 0:
            for col in l:
                if col[2].islower() and col != 'name':
                    break
                if col == 'name':
                    continue
                else:
                    types.append(col)
            line += 1
        else:
            row_mark = 0
            for col in l:
                if row_mark == 0:
                    peps[col] = []
                    row_mark += 1
                else:
                    peps[l[0]].append(col)



#convert sample to string
samples = ""

with open(argv[2], 'r') as sample:
    for c in sample:
        samples = samples + c




#DNA STR GROUPS
dna = { "AGATC" : 0,
        "AATG" : 0,
        "TATC" : 0,
        "TTTTTTCT" : 0,
        "TCTAG" : 0,
        "GATA" : 0,
        "GAAA" : 0,
        "TCTG" : 0 }

#go through all the strs in dna
for keys in dna:
    #the longest run of sequnace
    longest = 0
    #the current run of sequances
    run = 0
    size = len(keys)
    #look through sample for longest
    i = 0
    while i < len(samples):
        hold = samples[i:(i + size)]
        if hold == keys:
            run += 1
            #ensure the code does not go outside len of samples
            if ((i + size) < len(samples)):
                i = i + size
            continue
        if run > longest:
            longest = run
            run = 0
        i += 1
    dna[keys] = longest

#see who it is
positive = True
person = ''
for key in peps:
    positive = True
    for entry in types:
        x = types.index(entry)
        test = dna.get(entry)
        can = int(peps.get(key)[x])
        if (test != can):
            positive = False
    if positive == True:
        person = key
        break
if person != '':
    print(person)
else:
    print("No match")


推荐答案

问题出在此。仔细看一下这段代码。

Problem is in this while loop. Look at this code carefully.

while i < len(samples):
    hold = samples[i:(i + size)]
    if hold == keys:
        run += 1
        #ensure the code does not go outside len of samples
        if ((i + size) < len(samples)):
            i = i + size
        continue
    if run > longest:
        longest = run
        run = 0
    i += 1

您在这里缺少逻辑。您应该检查最长的连续DNA序列。因此,当您背对背重复dna序列时,需要找到重复的次数。仅当不再重复时,才需要检查这是否是最长的序列。

You have a missing logic here. You are supposed to check the longest consecutive DNA sequence. So when you have a repetition of dna sequence back to back, you need to find how many times it is repeated. When it is no longer repeated, only then, you need to check if this is the longest sequence.

解决方案

您需要添加 else if hold == keys:语句之后。这将是正确的解决方案;

You need to add else statement after if hold==keys: statement. This would be the right fix;

while i < len(samples):
    hold = samples[i:(i + size)]
    if hold == keys:
        run += 1
        #ensure the code does not go outside len of samples
        if ((i + size) < len(samples)):
            i = i + size
        continue
    else: #only if there is no longer sequence match, check this.
        if run > longest:
            longest = run
            run = 0
        else: #if the number of sequence match is still smaller then longest, then make run zero.
            run = 0
    i += 1

这篇关于CS50 DNA适用于small.csv,但不适用于Large的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆