循环遍历包含一组文件的大文件。 [英] looping through a big file containing a set of files.
问题描述
我有一个程序需要两个输入文件(一个是矩阵形式),另一个是序列形式。现在我的问题是我必须给出矩阵文件(含有许多矩阵)和包含许多序列的序列文件,并计算与一个矩阵文件和一个序列文件相同的日志分数。
它应该如何正常工作。对于每个序列,它应该计算所有权重矩阵的对数值,然后转到第二个序列并使用矩阵计算所有对数值。
我的矩阵文件很大,包含很多矩阵。它的一部分就在这里。
//
NA Abd-B
PO ACGT
01 10.19 0.00 10.65 6.24
02 5.79 0.67 10.50 10.11
03 4.50 0.00 0.00 22.57
04 0.00 0.00 0.00 27.08
05 0.00 0.00 0.00 27.08
06 0.00 0.00 0.00 27.08
07 27.08 0.00 0.00 0.00
08 0.00 2.83 0.00 24.25 >
09 0.00 0.00 24.45 2.62
10 19.33 0.00 4.34 3.41
11 0.31 12.28 3.39 11.09
//
//
NA Adf1
PO ACGT
01 0.71 0.08 26.02 1.5 5
02 3.03 23.00 1.24 1.09
03 0.26 10.50 3.29 14.31
04 0.00 0.06 28.23 0.07
05 0.12 27.27 0.06 0.91
06 1.44 20.36 0.37 6.19
07 5.35 0.28 21.49 1.24
08 7.81 16.10 3.81 0.63
09 0.51 17.77 0.45 9.63
10 0.00 0.14 28.21 0.00
11 0.00 25.69 0.20 2.46
12 0.48 9.98 0.07 17.82
13 1.27 0.00 27.01 0.07
14 15.59 7.98 2.92 1.87
15 4.28 22.37 0.00 1.70
16 0.18 0.77 22.70 4.70
//
//
NA Aef1
PO ACGT
01 0.00 0.06 12.49 0.00
02 3.80 0.17 0.00 8.57
03 0.87 0.06 0.00 11.62
04 0.06 9.76 2.32 0.41
05 9.82 0.00 2.73 0.00
06 9.76 0.00 0.00 2.78
07 3.80 0.31 0.00 8.43
08 0.00 0.00 0.00 12.54
09 0.00 6.53 5.85 0.17
10 0.00 12.38 0.17 0.00
11 2.73 1.02 8.80 0.00
12 5.85 0.00 6.70 0.00
13 1.02 5.96 0.00 5.57
14 0.00 5.16 4.66 2.73
15 1.03 7.55 3.97 0.00
16 4.82 5.00 2.73 0.00
//
//
NA Antp
PO ACGT
01 5.52 14.49 27.56 0.49
02 8.17 14.02 11.42 14.47
03 18.18 27.29 1.31 1.29
04 40.26 5.66 1.83 0.32
05 19.05 12.67 0.43 15.91
06 9.94 0.07 0.20 37.86
07 26.63 15.17 0.00 6.27
08 47.45 0.06 0.00 0.56
09 0.81 0.48 0.00 46.79
10 26.46 19.05 1.81 0.75
11 48.07 0.00 0.00 0.00
12 30.51 0.00 0.00 17.56
13 43.45 0.00 0.00 4.62
14 30.06 5.98 0.00 12.03
15 0.38 0.64 0.00 47.05
16 22.14 0.29 7.15 18.49
//
//
序列文件在这里(我的意思是这也是我文件的一部分)实际文件从CC开始;前面的行只是我们省略的标题,这个文件包含两个序列。
> CG9571_O-E | Drosophila melanogaster | CG9571 | FBgn0031086 | X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCC
个Cp36_DRR |果蝇| CP36 | FBgn0000359 | X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATA
这是我的代码可以工作(打印一个序列和一个矩阵的日志值)
从输入文件列表开始(希望在有很多时自动生成)。例如:展开 | 选择 | Wrap | 行号
我的序列文件:> CG9571_O-E | Drosophila melanogaster | CG9571 | FBgn0031086 | X:19926374..199271 33 <登记/>
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCCGTCCCTTAGCGCCTCCTGCATGGAT GTCGTTTTTGGGTTTCATACCTTTTCAC
ACTGGAAAAATACGG AATTTGTTGTAAGCCCTTTCAAGACGAATGGGATT TAGCTTCGGATGTCAACGTCACCATAAT
CATATTAGGAATATTTCTACTCAATTGCAATATTGGTACTTTTCTGACTG TAAACGCGATGATAATTACAAATATGCC
TAATTTGCTGTCTTTATAATCAAATGGAGTTCTTTATATTTCCAAAATAT TGAAATTCCGATTCCCTAGAAAATAATA
CGTTTTTCTGTTATTAATAAAAAACCAATAGGAAAGTTCTCAAAAATTAC TCTGTTGTATTTGATCATTTCTTTTCCG
GTATAATCTTTTATTTTAAGCATTCCCATGTGAATAAATTTCAGACTAAT GTATTAATAAGATGTCGTGTTTTTCCAC
TTACAAATTTCTCATACAGCTGGATATATACTACGAGTACTATACACATG CTCTGGG
个Cp36_DRR |果蝇| CP36 | FBgn0000359 | X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATACAAAAATTTGTTGTTTTTATATTGA ATTAAGAGTATCAAGAAAAAAATTTC
AGTGACTCATAATACACTACTCTACAAGTTTA AATTGAATCAACAATTTA ACTTTCATTGCTCAGGTTTTTAGTAACA
ATGTTTATATAAGTTTAGGTATAACAAATGATTTAAATATAAGATACTGT ATTTCACATTGAGACGAAACAATCCACC
GAAAATCATAAAATATAAGAATGTTGCATTTTATTTTTAAAAATAAAGAT GCCTTTTAAGAGGAATAACTTAAATGTC
TTTAATACCTTTGAATTTAATTATATGGCTAATAAACACAAACTTAAAGC TTAAAACTGCATCGAATTGAATGCGGTT
ATAAATGTACTTATATATCTAATATAATCTGCTAATATGGTTTACATGGT ATATCTTTCTCGGAAATTTTTACAAAAA
TTATCTATTCATATATCTCGAGCGTAAGATATTTATCAGTTTATAGATAA CATCTTTAAATTTGGGTGATTAAAAAAA
AACATTG
个Cp36_PRR |果蝇| CP36 | FBgn0000359 | X:8324430..8324513
TCTAGAGATCTGGGCACGATGGCGAGACAAAGATGCGGCGCAAAATCGGA AATGGAGATGGATCACGTAGCCGGCCAT
GGCGG
个Him_distal |果蝇|他| FBgn0030900 | X:18039896..18043470
GGTTTTCTGCGATGGCTTCCGCGCCAGCTGAAGTATCTGATTTGCTGCCT TGTTTTTGTTGATATTTCTGCGAAGGGA
CTTGTGCTTTTCAAATGGCCTTTTTTTGGGATTACGGCAAGGGCGCGTTT CCCACGCTCGATCCCCACTTACCATTGG
TGCACGCGATTGCGGCAAGCTGCTGAGGCAAGCTATTAAACGCCACACTG GGCCGGGGGGCGGTACCGGTGGGCGTGG
CAGGGGAGTCGACACATGTTGTGTGCCAGAGAACTTTGCTCCGATCCCCA GATCATCAAATAGTTGTCGCTGTCTGCT
CGTGCGCAAATTGCAATACTTTGCATACCCTTACTGCAGGGTATCTGAGC TTGGACTTTAAATAAGGGGGTATAACAT
AGCTTATACTCTCTATCTCTGTTATAAAGTCAATTTTCCTTAGATCTTTA GTACAGTGGGTAGTTAAGGAGACATAAC
TTCCAAAAAAAAAAACTATAAAATTGCAATAATTTATGCAAAATATGTAT TTTATTGAATGGGATGAATAATTTACCT
TATACGACTGTAAAACATTTCTAACGATTAAATGCACTTCTAAAAGTTTT CCCACAAGTAGGTGAGCTATTATGCTAA
GCGTTCCATGACTTGGAATCTAAGATCTTGTTTTGATCTTCGCTGATCTT TGAGAACTCGGGGATTACTTACACATTT
CTGGGCAGGCACAAGTGGGCCGAGGCAGTGTAGATTCATCACGTTTTCAC TCAACACACGCAGCTCATTAACAGCCCC
GCTGACAACTTGTCAGGACTTCCCCCTCGTGAATCCCCCTGCTACGCAAC CCCCATTCCCCGCCCATTCCAACACTTC
CCGCCGGGAGCGTGGGAAATTATGCGTGTTGGTGGGACGTCGGGCGGTGA AAATTGGCGCGCTCTTCGGGGGGCCACA
CCGCGTGGCATTGACAACTCTTCCACATTTCGCGCCCAACGATGCGTTGG CATCAGTGGGTCACAGGGATTACGG CTG
GCTGGGATTCCAGAGCCAGATCTTTTTCAGCCAAAACTTTCAGCTTTCGA AGACCTCAAGCGATAGGAGAGTGTCGGA
AGTCCAGAAATAGACGCGTAGCACATAAATTATGGATCGTATCGAGTATC GATTAGCCCGGGACAAGCGAAGCGATAG
GGAGACATATTTTTATTACCCTCTCGGGGACCTGCACTTGTTGGCTTCGC TTCTATGAAAGATCCCTCTACCATATCA
CGTATGTGGGCTCCCCCAATCGAACCGAGTTGTGGGAAATGTTTTCCCAG GCCAACAGCTAATTGTCACTCCAAGGGT
TGTCCCCGCAGCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGAT CTGAGTCAAGTGAAGGGCTTCAATTTCT
TTCCCGAGTGGAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTA GTCCGCAATCCTCAGCTAATGGGACTTA
CGAACATATATTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGG GAAGTCGCACACGCGCAAGTCAGGCGCT
CAAAAAGGGATCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAA TATAAATAAAATAATATTTAGCTCTATG
TGTTTATATAATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAA AAATACATCTTATATATCCCTATAATAA
GAAATAAATAATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGT ATTATTACCTCTTTTTTTTTGGTTGGTT
CTTTTTTGATGTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGC ATTGCCTTTCCCC ATCGGGGGATTCTAA
TTCCGTGGACGATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGT TTAATTATGGCCCATCTTGCATCTTGCA
CCGATGTGGATGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCG TTATCTGAGCATTTTGTACGCTCCACTC
CCTCTTCCCCCCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAG ATATTCCCAAGCGGCCAAAAATAGACGC
AAATTGTAACGCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATA AAATAGCAGAGAGACCCACAATAATATA
CGTTGATATACACATGTATATATGTATGTATGTACATAAAGGGCCAGGAG CAGGAACGTTAGGCATGCGGTGGTACGA
GCACCGTGGTGCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGA GTGGGTTGCATTGCGCACACAGAACATG
TGAATGCAGAGTTCAAGTGCATGCCGTGACACAGACACGCACACACACAC ACGCACACACAGATGAGTAGCCGCTGCA
AAGTGTTTTTTCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCC GATCCGATCCAATCCAATCCGATTGGAT
CCCATCTTGCGGCACTACGATTATGACGCTCGACACGATGATGCATTCGC AGAGTTTCCCGATCGCAGAGTACCCTGT
ACTCGAGTAGTTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGT ATAATATTCCATTATATTAATATTTTT
ATAGCACTAAAGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATA C TTAACCATAGAAACTTATGATATGATA
CCAATATTTAAGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGA AAATATTAATTTTCAAAATTGATATTCA
AGAGATATTATAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATG CTTTCTAATGAGTATAGTATACCCCTGC
TACCCTGTCAATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGC AGACTGCCACGGGAAAAATTCGGTTCGA
GATTTGGGAATGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTC GGATTTCGGAATGGATATGGAAATGAAG
ATGGAAATGGGACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATG CCGCTGGATGTTGCATGTGGCAGCGGTC
GGTGCAGCAGCGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGG CGATTGTGCGGCGCTGGTGCTGCCACAT
GTGTTCTGTGTTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAG AATTGACTCCACTTGAGCAATGTCCCAT
AAAGCGGGAGTTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAA CAAAAGAAAAAAAAAAAAAAAAAACACA
GCCAGTAACACATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAA GAGTCGATCTCCAAAACAAACCCGCAGA
GAGCACATATAAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCG CCGCAGCTCGACGCGCTCGCATATCGGG
AATATATAGATCGGAGATATCGCAGGACCCACAGCAGAGCA GAGCCGCAG AGCCACCAACCTCG
个Him_proximal |果蝇|他| FBgn0030900 | X:18041232..18043470
GCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGATCTGAGTCAAG TGAAGGGCTTCAATTTCTTTCCCGAGTG
GAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTAGTCCGCAATC CTCAGCTAATGGGACTTACGAACATATA
TTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGGGAAGTCGCAC ACGCGCAAGTCAGGCGCTCAAAAAGGGA
TCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAATATAAATAAA ATAATATTTAGCTCTATGTGTTTATATA
ATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAAAAATACATCT TATATATCCCTATAATAAGAAATAAATA
ATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGTATTATTACCT CTTTTTTGTTGGTTGGTTCTTTTTTGAT
GTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGCATTGCCTTTC CCCATCGGGGGATTCTAATTCCGTGGAC
GATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGTTTAATTATGG CCCATCTTGCATCTTGCACCGATGTGGA
TGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCGTTATCTGAGC ATTTTGTACGCTCCACTCCCTCTTCCCC
CCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAGATATTC CCAA GCGGCCAAAAATAGACGCAAATTGTAAC
GCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATAAAATAGCAGA GAGACCCACAATAATATACGTTGATATA
CACATGTATATATGTATGTATGTACATAAAGGGCCAGGAGCAGGAACGTT AGGCATGCGGTGGTACGAGCACCGTGGT
GCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGAGTGGGTTGCA TTGCGCACACAGAACATGTGAATGCAGA
GTTCAAGTGCATGCCGTGACACAGACACGCACACACACACACGCACACAC AGATGAGTAGCCGCTGCAAAGTGTTTTT
TCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCCGATCCGATCC AATCCAATCCGATTGGATCCCATCTTGC
GGCACTACGATTATGACGCTCGACACGATGATGCATTCGCAGAGTTTCCC GATCGCAGAGTACCCTGTACTCGAGTAG
TTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGTATAATATTCC ATTATATTAAATATTTTTATAGCACTAA
AGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATACTTAACCATA GAAACTTATGATATGATACCAATATTTA
AGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGAAAATATTAAT TTTCAAAATTGATATTCAAGAGATATTA
TAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATGCTTTCTAATG AGTATAGTATACCCCTGCTACCCTGTCA
ATCCGCAAAACAGGCGCCGAAACATGCGGTTTCT CGCAGCAGACTGCCAC GGGAAAAATTCGGTTCGAGATTTGGGAA
TGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTCGGATTTCGGA ATGGATATGGAAATGAAGATGGAAATGG
GACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATGCCGCTGGATG TTGCATGTGGCAGCGGTCGGTGCAGCAG
CGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGGCGATTGTGCG GCGCTGGTGCTGCCACATGTGTTCTGTG
TTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAGAATTGACTCC ACTTGAGCAATGTCCCATAAAGCGGGAG
TTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAACAAAAGAAAA AAAAAAAAAAAAAACACAGCCAGTAACA
CATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAAGAGTCGATCT CCAAAACAAACCCGCAGAGAGCACATAT
AAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCGCCGCAGCTCG ACGCGCTCGCATATCGGGAATATATAGA
TCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAGAGCCACCAAC CTCG
> Obp18a_prom |果蝇| Obp18a | FBgn0030985 | X:18969778..189727 46
ATGGCGAAAATCTGTTTCCCAACTAACAATGAGCGCATCATCACAGCTCT ATATATATAACCCATCGATTTGCTAATT
CAGCTCAAAAGTAGACAGGAGATTTTAATTAAATAATTGGATGCT ACTTT ACATTCGCCACACACCAACAAATAAAGT
CTATAATTGAAATTTTAAGCGCAGTTCCCGATTATGAGCTACACGTATGT CGTATGCGCAATATCTGCATTACAATTG
CCAATAGTAAATTACCAACTTGGTTTTCTTCATATTTATTAAGATAGAAA ACATACAATTTTTGGCTTTTACACTCCA
AGCATCTCTGAAGTTTAAACAAAAAACATATGTGTAGCCTATCTACTGTA TTGGACTTTATTCGTATATTTTATATGG
TTCATTAATATAGGTATAAATACAAATTATATTCACGCTTTGCGATTTGC AGCGAATATCACATCTTATACACGATGT
AAAAAAAAAAAAAATATTTCGTCATGTTTTTAGGTTGGCCGCAGGCAGTG CTCACTGTACCGCCACAATGTTTATCGT
TTTGCATTTTTTTTTTCTTTGTTTTCTTGCGGTTTCCCCTAATTATCTTT AGTATAAACTTAGTCTACTGTCTTTTTT
GGTAAGTATTTTCGTGATGGGCTCGTCTATGCGAATTCCCATTTCCAATG AATAAATAAAGTAATTAGAACATTAAAA
TTAGCAATAAAACACGTACATTTAAAGCTGACAACAAAAAAAAAAAGTAT TCTTATGTTAAACTGTAGTATGTGCCTA
TGCAATATTAAGAACAATTAAATAAAATAGCATATTAACTTATGGCAGCA CTTTGTTGCTATGTTTATGTTTATGTTT
ATGCACGCAGTTAGGCCAGGGCGGATGTAACATGATCACCCACTCGAAGG CAAAAAGTATAAGTGCATGGTCAGCATT
CACACGCCGACCAAATACATATTACATACGTAC ATACATATCTCGCTCTC CCGATAAGCCTAGATATATAAGATATAC
ATAAGAACGCCGCTCCGCTGCTGGCGTACCCGGCAGCGCAGCTACGCGGA TTAGCCTAAGTCCAAATATATTAAAAAC
TGTAAAATCAGAGAGACTCTGTAGACGTTGAGCTGACAGAACCATTTCTG CCTACTCTAAAATCAAAAGAAGAAATTG
AATAAATATATGTCAGCCCGACGGCTGCCTTCAACTTAAAACGGACTTGT GTTCTGAATTGGAGTTCATCATTACATG
GCGACCGTGACAGTCGTCCAACGCTGGACGAATTGACCAAAGCTGGTGAA AACAAAGGAACAAAGGAACACTGGACTG
GAAGAAGACTGGACTAATTAAATGGAACTGCAAAAACCAAGGAAAAATCT GAGTGAGTAGAGTTCTATTGAGTATGGG
CAAACACCGTGGCGGTTTGAAAACTAAGCTGAATAAACGTATAGCCCACG TAAGGTGGCTAATATACGGTCAGCAAAC
GCCACCGGTTTGGTCGAAAGCTCTAAAGCTACATGCAGAGCTAGACCACT TGTTGCAATATCAGCAAGAATTAAAGAC
CCATAAGCTCGAGAAAACTCACTCAGATAATATTAAAAATATACCCACAA TTAATGAAGTTCCAAAATACCAGGCATG
TCCAGCACCAGCACCAGCATTAACAAAACCAAAGAAGTCCTGCCCCCCTG GCTGCGAAGGAATCTGGAGTCCCCACTG
CCTGGGGACTTGTGAGCGACCATCGACGTCTTCAGCGGCGAAGAAATAGA CAGCAGCGAGGGAGTGTCAGCGTGCCAC
CCCCGGCGACGCCCAGCTGAC ACCTGATGAGCATCATCAACAGCAGAATA TAATAATAAATATATATAAATATAAAGT
AAATATAAAATATATATAGATAAGAAAAATTGTAAGAAATATTGTAAAAC GGAGCATATACTATTATGCCCTGTTAAC
CCAATATGGCCCGTGAAGCCATAGCTAGAATCAGGCAGGCAACAATGTAA AATACAATTTTTTTTTACTCTTGCGAAC
ATTGAAAGATTTTATAAATAGATAATTCCAAACATAAATGTCTATAGAGA CAAATGAAATAAGTAAAACTGAAAATAA
AAGTATATACAAAGGAAATTTTCTATTCTATTCTCCAAAATATAAAATTA GTATACCCAAAATGGGTCTAATAGACAC
TAAAACTGTGGACTCTACAGCCAATGTAATAAATAAAGTAGAAGTCCAAA ATGCAGACTTGTTCTGGATAACCATAAT
ACTAATTGTAATTGCATTAATTATGGTATCCAATGCATTAATAAAAATAT ACAAACTGCATAACAAGTGTCTTAAGAA
ACGATACCGTAGCACTGCTAACGGTATAGATAATATTTAAGGAAGATCTT TAATAAAGTCAATTATGAATGAAAATAT
GAGAAAAATTATATGAAAAAAAAAAAATAATAAATAAAAAAAAAAATATA AAACGTAATATTGAATTTATCTACGTTA
AAAAAAAAAATATATACAAATGAATAAATTTGAAGTTATGAGTATACCAC AGCATGGACTGGGAAAAGCTTGTTGATC
AGATAAAAGATCAAAATGAAAATTTCAGAAAATCCTATAAGTGCTTAACG CAAAACAGATCAACACAAGCTGTAACAA
TCAATAGGA ATGCCCAAGTCTTGGTAAATAGTTATAATGAAATCAGAGAG TTGATCCAACAAAATAGAAAGAATTTGG
AACGCAAACAGTGTGCTAAGGCTTTGAACCTACTGGTGACATTAAGAGAA AAATTAATATTTATAAAAAATAAATTCA
GTCTCCAGATAGAAATTCCAACCATAGTAAACACCCCACTAAGAATAAAT TTGAATGAAGACAGCACTAACTCTGACG
AGGAAGATAGGACTATAGTCAAGGAAGACATTAAAGAGGAAGATCTTCAC GATCTAACTATACCAGCAAAATTAATGC
TGAA
个; Obp19a_prom |果蝇| Obp19a | FBgn0031109 | X:20223943..202264 46
CCACCTGCGAAATGGGTCATAGTATATGTATTTGTAAAAAATGTATGTAA AAAAATGTTAAATTAATAATTTTGAATT
TCAATTTGGAGCTGAAAATAATATTTTGTGTCCATCAACAGCTCCAAAGC GATGGTTCATTTTATCTTGTGTGCGTTC
AATAGAATCACTCTTACGTTAGCGCGTCCATTGATGGTTGTCCCATTGAA GTACTTCTTAAAGCCGTCGGCCATTGCT
ACTGGACTGGATCTGGAGATCTGGAGATCTGGATTTGGGGTCGGGTCCGG GTGAGAGCTGAGTGTGTTCTGCCTATAG
CTCCGAGCGAGAACCTAATGACAAGCAGCGAAGTGCAAAGCTCGGCCAAC TAGATTACAAAGTCGATTCATTGGCAGG
ATTCGATTTTTATTGACTCAACGAGGTGGTACATGAGTTTGGTCCCCAAG CCTTTAACTGTGGCATCGAG GACCGGAA
AGGGGGTGCTGATTATAAATAGTTATGGATTGCTGACGGGTCGAATGGGT CGGAGCGGTGGGGAGCCATGACTTCAAT
GATTTGGCAGCATCGGCGCCCTAGCCATGGAGCATGGCCTGCTGGCAGCC CTTGCAGTAGAGCTTGGTCTCGCGCCGC
TTCGTGTTGCGGCGGTGCATCTTGACCAGGACGTAGACGAGTCCCAACGA GGCCCAGGTGGCCTTGGCTACCTGTGGG
TTTCGGTGGCGTATTTGGGCGCATCTTGTGTACTGCCGTGTACTGAATCA CTTACATTGGCGCGACCACGCATGGTCT
GGCTGTTGAAGGCTTCGTTGAAGTTGAAATGATCGGACATCTTTGGATCG TTGTTGACCGGATTGGCGTGGCTTTTAA
CAAAAGATTAAAATTTGGATTCGATATTCGACCTGTATTTTAGACCGGGA TTCGGATTGTGACTTTTAAACGTTCGAA
ATGAAAGGAATGTTACTGACAGTCGTCAAAGCCGACTCGGGTTTCCCAAC TAGAGAGAATGCTGAAGTCTAGTACCGA
CTAATGGGATACCCATTAATTACTGCTTAAATACTGTGATGAAAATTGAG ATATGCAAGAGGCAAATCGAAAGTTTTG
GACATTTTCATATTGTACCTTTAACCAACTTCAGAATTCATTGAGCTAAA TACCATTTACAATTTTATGAAATTTTTA
AGCATGTTACAGCTATAACTATTTTTAAACCAGTTACTAGATTCGTTGAA ATATTGGTCGGAATTAGGATCACTAGCCAAGCCGATATGGCTATGTCTGT AAGTCTTGGAATCTGATATT
AACATCGCATATCGATCGACCATTATATATCTAATATATCCTCTACAAAT GTATTTTATCACCTAGCTAGCATGTAAA
CATTCTGGCCTATTTAGCTGTACGCTTCAGTTATGCTAATGCAAACATAA GCCTTTTGTGATATTATAATTTACATTT
ATTATTTATTGCAGTTAGCTTTATCAGCGATTTGGGCTCATGCCACACGC AATACTACTTATTTCAACGTCATCAGTT
GTACTAAATGCACAAATGAAATACATTTCGCCAAATAAATGCCAACTTGC AACTAATTTGAATGCTAATCAAACCGAA
CTACTCATTTGCATACAAGGTAATAGGTGGTTAAAGTGAGTGTAATGGAC TTACTTAAGGGGTTACAAGGCTTATATT
TAAAATGCCTGCCTTGTAATTAAATTTTTAAATATATTGGAAAAAAATGG CCACTTGTTATGTGAGTCTCCAGAAAAA
AAACAAAAAAACAGCAACCATCTGGTATGCAAAATATCTGGTGGTAGCAA AATATCTGGTGGTATCTGGTGGACTATC
AAAATATAAAAACTTTTTTTTCCAGATAGTATATCTTAAAATCAGCATCT TGAAGGAGTATATGTAAATAGCAAACTA
TTTGTAAAAATAGATTTTATTTTATAATTTTTTAAGATATATACCAAACA TTATTACCGATTGTGATTATCTTTACAT
TGTTTGACCTCAAAACGGAAAACTGGATGCGCGGTATCCATGCGACCCTA ACTCTGGAACCGATTTTGGAACCGCCCC
GTTAGATCTCAGATTGAAACCTTATTTGCATTCGCATGATCGCTGATG AA CACTGGGGAAATGCGGCCCAGCAATGGG
ATTGTCAACGCATCTCGGCCAGAATCGCGCCTCGCATGCCACCTCGCACG GTGACCACATACCTGTGTACACTGTCAA
TTAACGTGGCAAGATTATAGCCCGGCCAGAAAGTAATCCGCCCCAGGAAC ACCACCCACCGCCCGCCCATTTGGATAT
GGAAATGGGCAGTGGGGGCGGCGATTGGCGCTAACCCATAATTCCCACAC CCACTTAGCGGTTCGATCGAACCAATAT
GAAGTCATTTGCATGTCGGGGGCCGTGTATAAAAGGAGTCGCCGATGGGT CTGGAGTCTGGAATCCGCCAAATCGTCT
CGGAAAT
个Obp19b_prom |果蝇| Obp19b | FBgn0031110 | X:20224439..202274 40
ATTGCTGACGGGTCGAATGGGTCGGAGCGGTGGGGAGCCATGACTTCAAT GATTTGGCAGCATCGGCGCCCTAGCCAT
GGAGCATGGCCTGCTGGCAGCCCTTGCAGTAGAGCTTGGTCTCGCGCCGC TTCGTGTTGCGGCGGTGCATCTTGACCA
GGACGTAGACGAGTCCCAACGAGGCCCAGGTGGCCTTGGCTACCTGTGGG TTTCGGTGGCGTATTTGGGCGCATCTTG
TGTACTGCCGTGTACTGAATCACTTACATTGGCGCGACCACGCATGGTCT GGCTGTTGAAGGCTTCGTTGAAGTTGAA
ATGATCGGACATCTTTGGATCGTTGTTGACCGGATTGGCGTGGCTTTTAA CAAAAGATTAAAATTTGGATTCGATATT
CGACCTGTATTTTAGA CCGGGATTCGGATTGTGACTTTTAAACGTTCGAA ATGAAAGGAATGTTACTGACAGTCGTCA
AAGCCGACTCGGGTTTCCCAACTAGAGAGAATGCTGAAGTCTAGTACCGA CTAATGGGATACCCATTAATTACTGCTT
AAATACTGTGATGAAAATTGAGATATGCAAGAGGCAAATCGAAAGTTTTG GACATTTTCATATTGTACCTTTAACCAA
CTTCAGAATTCATTGAGCTAAATACCATTTACAATTTTATGAAATTTTTA AGCATGTTACAGCTATAACTATTTTTAA
ACCAGTTACTAGATTCGTTGAAAATTGTATGTCACACAGAACTTCTTGCC ATCCTGGTCGGAATTAGGATCACTAGCC
AAGCCGATATGGCTATGTCTGTCCGTATGAAAGTCTTGGAATCTGATATT AACATCGCATATCGATCGACCATTATAT
ATCTAATATATCCTCTACAAATGTATTTTATCACCTAGCTAGCATGTAAA CATTCTGGCCTATTTAGCTGTACGCTTC
AGTTATGCTAATGCAAACATAAGCCTTTTGTGATATTATAATTTACATTT ATTATTTATTGCAGTTAGCTTTATCAGC
GATTTGGGCTCATGCCACACGCAATACTACTTATTTCAACGTCATCAGTT GTACTAAATGCACAAATGAAATACATTT
CGCCAAATAAATGCCAACTTGCAACTAATTTGAATGCTAATCAAACCGAA CTACTCATTTGCATACAAGGTAATAGGT
GGTTAAAGTGAGTGTAATGGACTTACTTAAGGGGTTACAAGGCTTATATT TAAAATGCCTGCCTTGTAATAATTTT
TAAA TATATTGGAAAAAAATGGCCACTTGTTATGTGAGTCTCCAGAAAAA AAACAAAAAAACAGCAACCATCTGGTAT
GCAAAATATCTGGTGGTAGCAAAATATCTGGTGGTATCTGGTGGACTATC AAAATATAAAAACTTTTTTTTCCAGATA
GTATATCTTAAAATCAGCATCTTGAAGGAGTATATGTAAATAGCAAACTA TTTGTAAAAATAGATTTTATTTTATAAT
TTTTTAAGATATATACCAAACATTATTACCGATTGTGATTATCTTTACAT TGTTTGACCTCAAAACGGAAAACTGGAT
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"我如何列出各个序列。新序列以>开头。 symbol.so我想把它们放在单独的lists.and我不能指定整个文件中的序列数量..我该怎么办?
我的序列文件:> CG9571_O-E |果蝇| CG9571 | FBgn0031086 | X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
.........................
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"我怎样才能使各个sequences.here新序列开始与"的列表;>" symbol.so我想把它们放在单独的lists.and我不能指定整个文件中的序列数量..我该怎么办?
Python非常适合这种东西:展开 | 选择 | Wrap | 行号
hey!
I have a program that takes two input files(one in the matrix form) and one in the sequence form.Now my problem is that i have to give the matrix file(containing many matrices) and sequence file containing many sequences and calculate the same log score as I did for one matrix file and one sequence file.
how it should exactly work is that. for every sequence it should calculate log values for all the weight matrices,then go to the second sequence and calculate all the log values using the matrices.
my matrix file is huge containing so many matrices. a part of it is here.
//
NA Abd-B
PO A C G T
01 10.19 0.00 10.65 6.24
02 5.79 0.67 10.50 10.11
03 4.50 0.00 0.00 22.57
04 0.00 0.00 0.00 27.08
05 0.00 0.00 0.00 27.08
06 0.00 0.00 0.00 27.08
07 27.08 0.00 0.00 0.00
08 0.00 2.83 0.00 24.25
09 0.00 0.00 24.45 2.62
10 19.33 0.00 4.34 3.41
11 0.31 12.28 3.39 11.09
//
//
NA Adf1
PO A C G T
01 0.71 0.08 26.02 1.55
02 3.03 23.00 1.24 1.09
03 0.26 10.50 3.29 14.31
04 0.00 0.06 28.23 0.07
05 0.12 27.27 0.06 0.91
06 1.44 20.36 0.37 6.19
07 5.35 0.28 21.49 1.24
08 7.81 16.10 3.81 0.63
09 0.51 17.77 0.45 9.63
10 0.00 0.14 28.21 0.00
11 0.00 25.69 0.20 2.46
12 0.48 9.98 0.07 17.82
13 1.27 0.00 27.01 0.07
14 15.59 7.98 2.92 1.87
15 4.28 22.37 0.00 1.70
16 0.18 0.77 22.70 4.70
//
//
NA Aef1
PO A C G T
01 0.00 0.06 12.49 0.00
02 3.80 0.17 0.00 8.57
03 0.87 0.06 0.00 11.62
04 0.06 9.76 2.32 0.41
05 9.82 0.00 2.73 0.00
06 9.76 0.00 0.00 2.78
07 3.80 0.31 0.00 8.43
08 0.00 0.00 0.00 12.54
09 0.00 6.53 5.85 0.17
10 0.00 12.38 0.17 0.00
11 2.73 1.02 8.80 0.00
12 5.85 0.00 6.70 0.00
13 1.02 5.96 0.00 5.57
14 0.00 5.16 4.66 2.73
15 1.03 7.55 3.97 0.00
16 4.82 5.00 2.73 0.00
//
//
NA Antp
PO A C G T
01 5.52 14.49 27.56 0.49
02 8.17 14.02 11.42 14.47
03 18.18 27.29 1.31 1.29
04 40.26 5.66 1.83 0.32
05 19.05 12.67 0.43 15.91
06 9.94 0.07 0.20 37.86
07 26.63 15.17 0.00 6.27
08 47.45 0.06 0.00 0.56
09 0.81 0.48 0.00 46.79
10 26.46 19.05 1.81 0.75
11 48.07 0.00 0.00 0.00
12 30.51 0.00 0.00 17.56
13 43.45 0.00 0.00 4.62
14 30.06 5.98 0.00 12.03
15 0.38 0.64 0.00 47.05
16 22.14 0.29 7.15 18.49
//
//
the sequence file is here( I mean this is also a part of my file)the actual file starts from "CC" the line before is just heading which we omit and this file is containg two sequences.
>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCC
>Cp36_DRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATA
this is my code which works(prints the log value for one sequence and one matrix)
Expand|Select|Wrap|Line Numbers解决方案Start with a list of input files (hopefully, generated automatically when there are many). Like:Expand|Select|Wrap|Line Numbers
my sequence file:>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCCGTCCCTTAGCGCCTCCTGCATGGAT GTCGTTTTTGGGTTTCATACCTTTTCAC
ACTGGAAAAATACGGAATTTGTTGTAAGCCCTTTCAAGACGAATGGGATT TAGCTTCGGATGTCAACGTCACCATAAT
CATATTAGGAATATTTCTACTCAATTGCAATATTGGTACTTTTCTGACTG TAAACGCGATGATAATTACAAATATGCC
TAATTTGCTGTCTTTATAATCAAATGGAGTTCTTTATATTTCCAAAATAT TGAAATTCCGATTCCCTAGAAAATAATA
CGTTTTTCTGTTATTAATAAAAAACCAATAGGAAAGTTCTCAAAAATTAC TCTGTTGTATTTGATCATTTCTTTTCCG
GTATAATCTTTTATTTTAAGCATTCCCATGTGAATAAATTTCAGACTAAT GTATTAATAAGATGTCGTGTTTTTCCAC
TTACAAATTTCTCATACAGCTGGATATATACTACGAGTACTATACACATG CTCTGGG
>Cp36_DRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATACAAAAATTTGTTGTGTTTGAATTGA ATTAAGAGCTTATCAAGAAAAAAATTTC
AGTGACTCATAATACACTACTCTACAAGTTTAAATTGAATCAACAATTTA ACTTTCATTGCTCAGGTTTTTAGTAACA
ATGTTTATATAAGTTTAGGTATAACAAATGATTTAAATATAAGATACTGT ATTTCACATTGAGACGAAACAATCCACC
GAAAATCATAAAATATAAGAATGTTGCATTTTATTTTTAAAAATAAAGAT GCCTTTTAAGAGGAATAACTTAAATGTC
TTTAATACCTTTGAATTTAATTATATGGCTAATAAACACAAACTTAAAGC TTAAAACTGCATCGAATTGAATGCGGTT
ATAAATGTACTTATATATCTAATATAATCTGCTAATATGGTTTACATGGT ATATCTTTCTCGGAAATTTTTACAAAAA
TTATCTATTCATATATCTCGAGCGTAAGATATTTATCAGTTTATAGATAA CATCTTTAAATTTGGGTGATTAAAAAAA
AACATTG
>Cp36_PRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8324430..8324513
TCTAGAGATCTGGGCACGATGGCGAGACAAAGATGCGGCGCAAAATCGGA AATGGAGATGGATCACGTAGCCGGCCAT
GGCGG
>Him_distal|Drosophila melanogaster|Him|FBgn0030900|X:18039896..18043470
GGTTTTCTGCGATGGCTTCCGCGCCAGCTGAAGTATCTGATTTGCTGCCT TGTTTTTGTTGATATTTCTGCGAAGGGA
CTTGTGCTTTTCAAATGGCCTTTTTTTGGGATTACGGCAAGGGCGCGTTT CCCACGCTCGATCCCCACTTACCATTGG
TGCACGCGATTGCGGCAAGCTGCTGAGGCAAGCTATTAAACGCCACACTG GGCCGGGGGGCGGTACCGGTGGGCGTGG
CAGGGGAGTCGACACATGTTGTGTGCCAGAGAACTTTGCTCCGATCCCCA GATCATCAAATAGTTGTCGCTGTCTGCT
CGTGCGCAAATTGCAATACTTTGCATACCCTTACTGCAGGGTATCTGAGC TTGGACTTTAAATAAGGGGGTATAACAT
AGCTTATACTCTCTATCTCTGTTATAAAGTCAATTTTCCTTAGATCTTTA GTACAGTGGGTAGTTAAGGAGACATAAC
TTCCAAAAAAAAAAACTATAAAATTGCAATAATTTATGCAAAATATGTAT TTTATTGAATGGGATGAATAATTTACCT
TATACGACTGTAAAACATTTCTAACGATTAAATGCACTTCTAAAAGTTTT CCCACAAGTAGGTGAGCTATTATGCTAA
GCGTTCCATGACTTGGAATCTAAGATCTTGTTTTGATCTTCGCTGATCTT TGAGAACTCGGGGATTACTTACACATTT
CTGGGCAGGCACAAGTGGGCCGAGGCAGTGTAGATTCATCACGTTTTCAC TCAACACACGCAGCTCATTAACAGCCCC
GCTGACAACTTGTCAGGACTTCCCCCTCGTGAATCCCCCTGCTACGCAAC CCCCATTCCCCGCCCATTCCAACACTTC
CCGCCGGGAGCGTGGGAAATTATGCGTGTTGGTGGGACGTCGGGCGGTGA AAATTGGCGCGCTCTTCGGGGGGCCACA
CCGCGTGGCATTGACAACTCTTCCACATTTCGCGCCCAACGATGCGTTGG CATCAGTGGGTCACAGGGATTACGGCTG
GCTGGGATTCCAGAGCCAGATCTTTTTCAGCCAAAACTTTCAGCTTTCGA AGACCTCAAGCGATAGGAGAGTGTCGGA
AGTCCAGAAATAGACGCGTAGCACATAAATTATGGATCGTATCGAGTATC GATTAGCCCGGGACAAGCGAAGCGATAG
GGAGACATATTTTTATTACCCTCTCGGGGACCTGCACTTGTTGGCTTCGC TTCTATGAAAGATCCCTCTACCATATCA
CGTATGTGGGCTCCCCCAATCGAACCGAGTTGTGGGAAATGTTTTCCCAG GCCAACAGCTAATTGTCACTCCAAGGGT
TGTCCCCGCAGCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGAT CTGAGTCAAGTGAAGGGCTTCAATTTCT
TTCCCGAGTGGAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTA GTCCGCAATCCTCAGCTAATGGGACTTA
CGAACATATATTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGG GAAGTCGCACACGCGCAAGTCAGGCGCT
CAAAAAGGGATCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAA TATAAATAAAATAATATTTAGCTCTATG
TGTTTATATAATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAA AAATACATCTTATATATCCCTATAATAA
GAAATAAATAATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGT ATTATTACCTCTTTTTTGTTGGTTGGTT
CTTTTTTGATGTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGC ATTGCCTTTCCCCATCGGGGGATTCTAA
TTCCGTGGACGATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGT TTAATTATGGCCCATCTTGCATCTTGCA
CCGATGTGGATGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCG TTATCTGAGCATTTTGTACGCTCCACTC
CCTCTTCCCCCCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAG ATATTCCCAAGCGGCCAAAAATAGACGC
AAATTGTAACGCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATA AAATAGCAGAGAGACCCACAATAATATA
CGTTGATATACACATGTATATATGTATGTATGTACATAAAGGGCCAGGAG CAGGAACGTTAGGCATGCGGTGGTACGA
GCACCGTGGTGCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGA GTGGGTTGCATTGCGCACACAGAACATG
TGAATGCAGAGTTCAAGTGCATGCCGTGACACAGACACGCACACACACAC ACGCACACACAGATGAGTAGCCGCTGCA
AAGTGTTTTTTCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCC GATCCGATCCAATCCAATCCGATTGGAT
CCCATCTTGCGGCACTACGATTATGACGCTCGACACGATGATGCATTCGC AGAGTTTCCCGATCGCAGAGTACCCTGT
ACTCGAGTAGTTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGT ATAATATTCCATTATATTAAATATTTTT
ATAGCACTAAAGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATA CTTAACCATAGAAACTTATGATATGATA
CCAATATTTAAGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGA AAATATTAATTTTCAAAATTGATATTCA
AGAGATATTATAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATG CTTTCTAATGAGTATAGTATACCCCTGC
TACCCTGTCAATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGC AGACTGCCACGGGAAAAATTCGGTTCGA
GATTTGGGAATGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTC GGATTTCGGAATGGATATGGAAATGAAG
ATGGAAATGGGACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATG CCGCTGGATGTTGCATGTGGCAGCGGTC
GGTGCAGCAGCGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGG CGATTGTGCGGCGCTGGTGCTGCCACAT
GTGTTCTGTGTTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAG AATTGACTCCACTTGAGCAATGTCCCAT
AAAGCGGGAGTTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAA CAAAAGAAAAAAAAAAAAAAAAAACACA
GCCAGTAACACATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAA GAGTCGATCTCCAAAACAAACCCGCAGA
GAGCACATATAAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCG CCGCAGCTCGACGCGCTCGCATATCGGG
AATATATAGATCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAG AGCCACCAACCTCG
>Him_proximal|Drosophila melanogaster|Him|FBgn0030900|X:18041232..18043470
GCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGATCTGAGTCAAG TGAAGGGCTTCAATTTCTTTCCCGAGTG
GAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTAGTCCGCAATC CTCAGCTAATGGGACTTACGAACATATA
TTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGGGAAGTCGCAC ACGCGCAAGTCAGGCGCTCAAAAAGGGA
TCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAATATAAATAAA ATAATATTTAGCTCTATGTGTTTATATA
ATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAAAAATACATCT TATATATCCCTATAATAAGAAATAAATA
ATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGTATTATTACCT CTTTTTTGTTGGTTGGTTCTTTTTTGAT
GTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGCATTGCCTTTC CCCATCGGGGGATTCTAATTCCGTGGAC
GATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGTTTAATTATGG CCCATCTTGCATCTTGCACCGATGTGGA
TGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCGTTATCTGAGC ATTTTGTACGCTCCACTCCCTCTTCCCC
CCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAGATATTCCCAA GCGGCCAAAAATAGACGCAAATTGTAAC
GCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATAAAATAGCAGA GAGACCCACAATAATATACGTTGATATA
CACATGTATATATGTATGTATGTACATAAAGGGCCAGGAGCAGGAACGTT AGGCATGCGGTGGTACGAGCACCGTGGT
GCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGAGTGGGTTGCA TTGCGCACACAGAACATGTGAATGCAGA
GTTCAAGTGCATGCCGTGACACAGACACGCACACACACACACGCACACAC AGATGAGTAGCCGCTGCAAAGTGTTTTT
TCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCCGATCCGATCC AATCCAATCCGATTGGATCCCATCTTGC
GGCACTACGATTATGACGCTCGACACGATGATGCATTCGCAGAGTTTCCC GATCGCAGAGTACCCTGTACTCGAGTAG
TTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGTATAATATTCC ATTATATTAAATATTTTTATAGCACTAA
AGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATACTTAACCATA GAAACTTATGATATGATACCAATATTTA
AGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGAAAATATTAAT TTTCAAAATTGATATTCAAGAGATATTA
TAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATGCTTTCTAATG AGTATAGTATACCCCTGCTACCCTGTCA
ATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGCAGACTGCCAC GGGAAAAATTCGGTTCGAGATTTGGGAA
TGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTCGGATTTCGGA ATGGATATGGAAATGAAGATGGAAATGG
GACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATGCCGCTGGATG TTGCATGTGGCAGCGGTCGGTGCAGCAG
CGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGGCGATTGTGCG GCGCTGGTGCTGCCACATGTGTTCTGTG
TTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAGAATTGACTCC ACTTGAGCAATGTCCCATAAAGCGGGAG
TTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAACAAAAGAAAA AAAAAAAAAAAAAACACAGCCAGTAACA
CATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAAGAGTCGATCT CCAAAACAAACCCGCAGAGAGCACATAT
AAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCGCCGCAGCTCG ACGCGCTCGCATATCGGGAATATATAGA
TCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAGAGCCACCAAC CTCG
>Obp18a_prom|Drosophila melanogaster|Obp18a|FBgn0030985|X:18969778..189727 46
ATGGCGAAAATCTGTTTCCCAACTAACAATGAGCGCATCATCACAGCTCT ATATATATAACCCATCGATTTGCTAATT
CAGCTCAAAAGTAGACAGGAGATTTTAATTAAATAATTGGATGCTACTTT ACATTCGCCACACACCAACAAATAAAGT
CTATAATTGAAATTTTAAGCGCAGTTCCCGATTATGAGCTACACGTATGT CGTATGCGCAATATCTGCATTACAATTG
CCAATAGTAAATTACCAACTTGGTTTTCTTCATATTTATTAAGATAGAAA ACATACAATTTTTGGCTTTTACACTCCA
AGCATCTCTGAAGTTTAAACAAAAAACATATGTGTAGCCTATCTACTGTA TTGGACTTTATTCGTATATTTTATATGG
TTCATTAATATAGGTATAAATACAAATTATATTCACGCTTTGCGATTTGC AGCGAATATCACATCTTATACACGATGT
AAAAAAAAAAAAAATATTTCGTCATGTTTTTAGGTTGGCCGCAGGCAGTG CTCACTGTACCGCCACAATGTTTATCGT
TTTGCATTTTTTTTTTCTTTGTTTTCTTGCGGTTTCCCCTAATTATCTTT AGTATAAACTTAGTCTACTGTCTTTTTT
GGTAAGTATTTTCGTGATGGGCTCGTCTATGCGAATTCCCATTTCCAATG AATAAATAAAGTAATTAGAACATTAAAA
TTAGCAATAAAACACGTACATTTAAAGCTGACAACAAAAAAAAAAAGTAT TCTTATGTTAAACTGTAGTATGTGCCTA
TGCAATATTAAGAACAATTAAATAAAATAGCATATTAACTTATGGCAGCA CTTTGTTGCTATGTTTATGTTTATGTTT
ATGCACGCAGTTAGGCCAGGGCGGATGTAACATGATCACCCACTCGAAGG CAAAAAGTATAAGTGCATGGTCAGCATT
CACACGCCGACCAAATACATATTACATACGTACATACATATCTCGCTCTC CCGATAAGCCTAGATATATAAGATATAC
ATAAGAACGCCGCTCCGCTGCTGGCGTACCCGGCAGCGCAGCTACGCGGA TTAGCCTAAGTCCAAATATATTAAAAAC
TGTAAAATCAGAGAGACTCTGTAGACGTTGAGCTGACAGAACCATTTCTG CCTACTCTAAAATCAAAAGAAGAAATTG
AATAAATATATGTCAGCCCGACGGCTGCCTTCAACTTAAAACGGACTTGT GTTCTGAATTGGAGTTCATCATTACATG
GCGACCGTGACAGTCGTCCAACGCTGGACGAATTGACCAAAGCTGGTGAA AACAAAGGAACAAAGGAACACTGGACTG
GAAGAAGACTGGACTAATTAAATGGAACTGCAAAAACCAAGGAAAAATCT GAGTGAGTAGAGTTCTATTGAGTATGGG
CAAACACCGTGGCGGTTTGAAAACTAAGCTGAATAAACGTATAGCCCACG TAAGGTGGCTAATATACGGTCAGCAAAC
GCCACCGGTTTGGTCGAAAGCTCTAAAGCTACATGCAGAGCTAGACCACT TGTTGCAATATCAGCAAGAATTAAAGAC
CCATAAGCTCGAGAAAACTCACTCAGATAATATTAAAAATATACCCACAA TTAATGAAGTTCCAAAATACCAGGCATG
TCCAGCACCAGCACCAGCATTAACAAAACCAAAGAAGTCCTGCCCCCCTG GCTGCGAAGGAATCTGGAGTCCCCACTG
CCTGGGGACTTGTGAGCGACCATCGACGTCTTCAGCGGCGAAGAAATAGA CAGCAGCGAGGGAGTGTCAGCGTGCCAC
CCCCGGCGACGCCCAGCTGACACCTGATGAGCATCATCAACAGCAGAATA TAATAATAAATATATATAAATATAAAGT
AAATATAAAATATATATAGATAAGAAAAATTGTAAGAAATATTGTAAAAC GGAGCATATACTATTATGCCCTGTTAAC
CCAATATGGCCCGTGAAGCCATAGCTAGAATCAGGCAGGCAACAATGTAA AATACAATTTTTTTTTACTCTTGCGAAC
ATTGAAAGATTTTATAAATAGATAATTCCAAACATAAATGTCTATAGAGA CAAATGAAATAAGTAAAACTGAAAATAA
AAGTATATACAAAGGAAATTTTCTATTCTATTCTCCAAAATATAAAATTA GTATACCCAAAATGGGTCTAATAGACAC
TAAAACTGTGGACTCTACAGCCAATGTAATAAATAAAGTAGAAGTCCAAA ATGCAGACTTGTTCTGGATAACCATAAT
ACTAATTGTAATTGCATTAATTATGGTATCCAATGCATTAATAAAAATAT ACAAACTGCATAACAAGTGTCTTAAGAA
ACGATACCGTAGCACTGCTAACGGTATAGATAATATTTAAGGAAGATCTT TAATAAAGTCAATTATGAATGAAAATAT
GAGAAAAATTATATGAAAAAAAAAAAATAATAAATAAAAAAAAAAATATA AAACGTAATATTGAATTTATCTACGTTA
AAAAAAAAAATATATACAAATGAATAAATTTGAAGTTATGAGTATACCAC AGCATGGACTGGGAAAAGCTTGTTGATC
AGATAAAAGATCAAAATGAAAATTTCAGAAAATCCTATAAGTGCTTAACG CAAAACAGATCAACACAAGCTGTAACAA
TCAATAGGAATGCCCAAGTCTTGGTAAATAGTTATAATGAAATCAGAGAG TTGATCCAACAAAATAGAAAGAATTTGG
AACGCAAACAGTGTGCTAAGGCTTTGAACCTACTGGTGACATTAAGAGAA AAATTAATATTTATAAAAAATAAATTCA
GTCTCCAGATAGAAATTCCAACCATAGTAAACACCCCACTAAGAATAAAT TTGAATGAAGACAGCACTAACTCTGACG
AGGAAGATAGGACTATAGTCAAGGAAGACATTAAAGAGGAAGATCTTCAC GATCTAACTATACCAGCAAAATTAATGC
TGAA
>Obp19a_prom|Drosophila melanogaster|Obp19a|FBgn0031109|X:20223943..202264 46
CCACCTGCGAAATGGGTCATAGTATATGTATTTGTAAAAAATGTATGTAA AAAAATGTTAAATTAATAATTTTGAATT
TCAATTTGGAGCTGAAAATAATATTTTGTGTCCATCAACAGCTCCAAAGC GATGGTTCATTTTATCTTGTGTGCGTTC
AATAGAATCACTCTTACGTTAGCGCGTCCATTGATGGTTGTCCCATTGAA GTACTTCTTAAAGCCGTCGGCCATTGCT
ACTGGACTGGATCTGGAGATCTGGAGATCTGGATTTGGGGTCGGGTCCGG GTGAGAGCTGAGTGTGTTCTGCCTATAG
CTCCGAGCGAGAACCTAATGACAAGCAGCGAAGTGCAAAGCTCGGCCAAC TAGATTACAAAGTCGATTCATTGGCAGG
ATTCGATTTTTATTGACTCAACGAGGTGGTACATGAGTTTGGTCCCCAAG CCTTTAACTGTGGCATCGAGGACCGGAA
AGGGGGTGCTGATTATAAATAGTTATGGATTGCTGACGGGTCGAATGGGT CGGAGCGGTGGGGAGCCATGACTTCAAT
GATTTGGCAGCATCGGCGCCCTAGCCATGGAGCATGGCCTGCTGGCAGCC CTTGCAGTAGAGCTTGGTCTCGCGCCGC
TTCGTGTTGCGGCGGTGCATCTTGACCAGGACGTAGACGAGTCCCAACGA GGCCCAGGTGGCCTTGGCTACCTGTGGG
TTTCGGTGGCGTATTTGGGCGCATCTTGTGTACTGCCGTGTACTGAATCA CTTACATTGGCGCGACCACGCATGGTCT
GGCTGTTGAAGGCTTCGTTGAAGTTGAAATGATCGGACATCTTTGGATCG TTGTTGACCGGATTGGCGTGGCTTTTAA
CAAAAGATTAAAATTTGGATTCGATATTCGACCTGTATTTTAGACCGGGA TTCGGATTGTGACTTTTAAACGTTCGAA
ATGAAAGGAATGTTACTGACAGTCGTCAAAGCCGACTCGGGTTTCCCAAC TAGAGAGAATGCTGAAGTCTAGTACCGA
CTAATGGGATACCCATTAATTACTGCTTAAATACTGTGATGAAAATTGAG ATATGCAAGAGGCAAATCGAAAGTTTTG
GACATTTTCATATTGTACCTTTAACCAACTTCAGAATTCATTGAGCTAAA TACCATTTACAATTTTATGAAATTTTTA
AGCATGTTACAGCTATAACTATTTTTAAACCAGTTACTAGATTCGTTGAA AATTGTATGTCACACAGAACTTCTTGCC
ATCCTGGTCGGAATTAGGATCACTAGCCAAGCCGATATGGCTATGTCTGT CCGTATGAAAGTCTTGGAATCTGATATT
AACATCGCATATCGATCGACCATTATATATCTAATATATCCTCTACAAAT GTATTTTATCACCTAGCTAGCATGTAAA
CATTCTGGCCTATTTAGCTGTACGCTTCAGTTATGCTAATGCAAACATAA GCCTTTTGTGATATTATAATTTACATTT
ATTATTTATTGCAGTTAGCTTTATCAGCGATTTGGGCTCATGCCACACGC AATACTACTTATTTCAACGTCATCAGTT
GTACTAAATGCACAAATGAAATACATTTCGCCAAATAAATGCCAACTTGC AACTAATTTGAATGCTAATCAAACCGAA
CTACTCATTTGCATACAAGGTAATAGGTGGTTAAAGTGAGTGTAATGGAC TTACTTAAGGGGTTACAAGGCTTATATT
TAAAATGCCTGCCTTGTAATTAAATTTTTAAATATATTGGAAAAAAATGG CCACTTGTTATGTGAGTCTCCAGAAAAA
AAACAAAAAAACAGCAACCATCTGGTATGCAAAATATCTGGTGGTAGCAA AATATCTGGTGGTATCTGGTGGACTATC
AAAATATAAAAACTTTTTTTTCCAGATAGTATATCTTAAAATCAGCATCT TGAAGGAGTATATGTAAATAGCAAACTA
TTTGTAAAAATAGATTTTATTTTATAATTTTTTAAGATATATACCAAACA TTATTACCGATTGTGATTATCTTTACAT
TGTTTGACCTCAAAACGGAAAACTGGATGCGCGGTATCCATGCGACCCTA ACTCTGGAACCGATTTTGGAACCGCCCC
GTTAGATCTCAGATTGAAACCTTATTTGCATTCGCATGATCGCTGATGAA CACTGGGGAAATGCGGCCCAGCAATGGG
ATTGTCAACGCATCTCGGCCAGAATCGCGCCTCGCATGCCACCTCGCACG GTGACCACATACCTGTGTACACTGTCAA
TTAACGTGGCAAGATTATAGCCCGGCCAGAAAGTAATCCGCCCCAGGAAC ACCACCCACCGCCCGCCCATTTGGATAT
GGAAATGGGCAGTGGGGGCGGCGATTGGCGCTAACCCATAATTCCCACAC CCACTTAGCGGTTCGATCGAACCAATAT
GAAGTCATTTGCATGTCGGGGGCCGTGTATAAAAGGAGTCGCCGATGGGT CTGGAGTCTGGAATCCGCCAAATCGTCT
CGGAAAT
>Obp19b_prom|Drosophila melanogaster|Obp19b|FBgn0031110|X:20224439..202274 40
ATTGCTGACGGGTCGAATGGGTCGGAGCGGTGGGGAGCCATGACTTCAAT GATTTGGCAGCATCGGCGCCCTAGCCAT
GGAGCATGGCCTGCTGGCAGCCCTTGCAGTAGAGCTTGGTCTCGCGCCGC TTCGTGTTGCGGCGGTGCATCTTGACCA
GGACGTAGACGAGTCCCAACGAGGCCCAGGTGGCCTTGGCTACCTGTGGG TTTCGGTGGCGTATTTGGGCGCATCTTG
TGTACTGCCGTGTACTGAATCACTTACATTGGCGCGACCACGCATGGTCT GGCTGTTGAAGGCTTCGTTGAAGTTGAA
ATGATCGGACATCTTTGGATCGTTGTTGACCGGATTGGCGTGGCTTTTAA CAAAAGATTAAAATTTGGATTCGATATT
CGACCTGTATTTTAGACCGGGATTCGGATTGTGACTTTTAAACGTTCGAA ATGAAAGGAATGTTACTGACAGTCGTCA
AAGCCGACTCGGGTTTCCCAACTAGAGAGAATGCTGAAGTCTAGTACCGA CTAATGGGATACCCATTAATTACTGCTT
AAATACTGTGATGAAAATTGAGATATGCAAGAGGCAAATCGAAAGTTTTG GACATTTTCATATTGTACCTTTAACCAA
CTTCAGAATTCATTGAGCTAAATACCATTTACAATTTTATGAAATTTTTA AGCATGTTACAGCTATAACTATTTTTAA
ACCAGTTACTAGATTCGTTGAAAATTGTATGTCACACAGAACTTCTTGCC ATCCTGGTCGGAATTAGGATCACTAGCC
AAGCCGATATGGCTATGTCTGTCCGTATGAAAGTCTTGGAATCTGATATT AACATCGCATATCGATCGACCATTATAT
ATCTAATATATCCTCTACAAATGTATTTTATCACCTAGCTAGCATGTAAA CATTCTGGCCTATTTAGCTGTACGCTTC
AGTTATGCTAATGCAAACATAAGCCTTTTGTGATATTATAATTTACATTT ATTATTTATTGCAGTTAGCTTTATCAGC
GATTTGGGCTCATGCCACACGCAATACTACTTATTTCAACGTCATCAGTT GTACTAAATGCACAAATGAAATACATTT
CGCCAAATAAATGCCAACTTGCAACTAATTTGAATGCTAATCAAACCGAA CTACTCATTTGCATACAAGGTAATAGGT
GGTTAAAGTGAGTGTAATGGACTTACTTAAGGGGTTACAAGGCTTATATT TAAAATGCCTGCCTTGTAATTAAATTTT
TAAATATATTGGAAAAAAATGGCCACTTGTTATGTGAGTCTCCAGAAAAA AAACAAAAAAACAGCAACCATCTGGTAT
GCAAAATATCTGGTGGTAGCAAAATATCTGGTGGTATCTGGTGGACTATC AAAATATAAAAACTTTTTTTTCCAGATA
GTATATCTTAAAATCAGCATCTTGAAGGAGTATATGTAAATAGCAAACTA TTTGTAAAAATAGATTTTATTTTATAAT
TTTTTAAGATATATACCAAACATTATTACCGATTGTGATTATCTTTACAT TGTTTGACCTCAAAACGGAAAACTGGAT
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"how can i make a list of the individual sequences.here the new sequences start with">" symbol.so i want to put them in individual lists.and i cant specify the number of sequences in the entire file.. how can i do?"
my sequence file:>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
.........................
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"how can i make a list of the individual sequences.here the new sequences start with">" symbol.so i want to put them in individual lists.and i cant specify the number of sequences in the entire file.. how can i do?"Python is great for this kind of stuff:
Expand|Select|Wrap|Line Numbers
这篇关于循环遍历包含一组文件的大文件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文登录 关闭
扫码关注1秒登录发送“验证码”获取 | 15天全站免登陆