循环遍历包含一组文件的大文件。 [英] looping through a big file containing a set of files.

查看:42
本文介绍了循环遍历包含一组文件的大文件。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿!

我有一个程序需要两个输入文件(一个是矩阵形式),另一个是序列形式。现在我的问题是我必须给出矩阵文件(含有许多矩阵)和包含许多序列的序列文件,并计算与一个矩阵文件和一个序列文件相同的日志分数。

它应该如何正常工作。对于每个序列,它应该计算所有权重矩阵的对数值,然后转到第二个序列并使用矩阵计算所有对数值。

我的矩阵文件很大,包含很多矩阵。它的一部分就在这里。


//

NA Abd-B

PO ACGT

01 10.19 0.00 10.65 6.24

02 5.79 0.67 10.50 10.11

03 4.50 0.00 0.00 22.57

04 0.00 0.00 0.00 27.08

05 0.00 0.00 0.00 27.08

06 0.00 0.00 0.00 27.08

07 27.08 0.00 0.00 0.00

08 0.00 2.83 0.00 24.25
09 0.00 0.00 24.45 2.62

10 19.33 0.00 4.34 3.41

11 0.31 12.28 3.39 11.09

//

//

NA Adf1

PO ACGT

01 0.71 0.08 26.02 1.5 5

02 3.03 23.00 1.24 1.09

03 0.26 10.50 3.29 14.31

04 0.00 0.06 28.23 0.07

05 0.12 27.27 0.06 0.91

06 1.44 20.36 0.37 6.19

07 5.35 0.28 21.49 1.24

08 7.81 16.10 3.81 0.63

09 0.51 17.77 0.45 9.63

10 0.00 0.14 28.21 0.00

11 0.00 25.69 0.20 2.46

12 0.48 9.98 0.07 17.82

13 1.27 0.00 27.01 0.07

14 15.59 7.98 2.92 1.87

15 4.28 22.37 0.00 1.70

16 0.18 0.77 22.70 4.70

//

//

NA Aef1

PO ACGT

01 0.00 0.06 12.49 0.00

02 3.80 0.17 0.00 8.57

03 0.87 0.06 0.00 11.62

04 0.06 9.76 2.32 0.41

05 9.82 0.00 2.73 0.00

06 9.76 0.00 0.00 2.78

07 3.80 0.31 0.00 8.43

08 0.00 0.00 0.00 12.54

09 0.00 6.53 5.85 0.17

10 0.00 12.38 0.17 0.00

11 2.73 1.02 8.80 0.00

12 5.85 0.00 6.70 0.00

13 1.02 5.96 0.00 5.57

14 0.00 5.16 4.66 2.73

15 1.03 7.55 3.97 0.00

16 4.82 5.00 2.73 0.00

//

//

NA Antp

PO ACGT

01 5.52 14.49 27.56 0.49

02 8.17 14.02 11.42 14.47

03 18.18 27.29 1.31 1.29

04 40.26 5.66 1.83 0.32

05 19.05 12.67 0.43 15.91

06 9.94 0.07 0.20 37.86

07 26.63 15.17 0.00 6.27

08 47.45 0.06 0.00 0.56

09 0.81 0.48 0.00 46.79

10 26.46 19.05 1.81 0.75

11 48.07 0.00 0.00 0.00

12 30.51 0.00 0.00 17.56

13 43.45 0.00 0.00 4.62

14 30.06 5.98 0.00 12.03

15 0.38 0.64 0.00 47.05

16 22.14 0.29 7.15 18.49

//

//


序列文件在这里(我的意思是这也是我文件的一部分)实际文件从CC开始;前面的行只是我们省略的标题,这个文件包含两个序列。

> CG9571_O-E | Drosophila melanogaster | CG9571 | FBgn0031086 | X:19926374..199271 33

CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG

GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT

CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC

CCTGGAGCCATCGTCCTCGTCCTCC

个Cp36_DRR |果蝇| CP36 | FBgn0000359 | X:8323349..8324136

AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC

AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC

ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT

AATACAAAAAAAATATATATATATA

这是我的代码可以工作(打印一个序列和一个矩阵的日志值)

展开 | 选择 | < span class =codeLinkonclick =WordWrap(this);> Wrap | 行号

解决方案

从输入文件列表开始(希望在有很多时自动生成)。例如:

展开 | 选择 | Wrap | 行号


我的序列文件:> CG9571_O-E | Drosophila melanogaster | CG9571 | FBgn0031086 | X:19926374..199271 33 <登记/>
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG

GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT

CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC

CCTGGAGCCATCGTCCTCGTCCTCCGTCCCTTAGCGCCTCCTGCATGGAT GTCGTTTTTGGGTTTCATACCTTTTCAC

ACTGGAAAAATACGG AATTTGTTGTAAGCCCTTTCAAGACGAATGGGATT TAGCTTCGGATGTCAACGTCACCATAAT

CATATTAGGAATATTTCTACTCAATTGCAATATTGGTACTTTTCTGACTG TAAACGCGATGATAATTACAAATATGCC

TAATTTGCTGTCTTTATAATCAAATGGAGTTCTTTATATTTCCAAAATAT TGAAATTCCGATTCCCTAGAAAATAATA

CGTTTTTCTGTTATTAATAAAAAACCAATAGGAAAGTTCTCAAAAATTAC TCTGTTGTATTTGATCATTTCTTTTCCG

GTATAATCTTTTATTTTAAGCATTCCCATGTGAATAAATTTCAGACTAAT GTATTAATAAGATGTCGTGTTTTTCCAC

TTACAAATTTCTCATACAGCTGGATATATACTACGAGTACTATACACATG CTCTGGG

个Cp36_DRR |果蝇| CP36 | FBgn0000359 | X:8323349..8324136

AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC

AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC

ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT

AATACAAAAAAAATATATATATATACAAAAATTTGTTGTTTTTATATTGA ATTAAGAGTATCAAGAAAAAAATTTC

AGTGACTCATAATACACTACTCTACAAGTTTA AATTGAATCAACAATTTA ACTTTCATTGCTCAGGTTTTTAGTAACA

ATGTTTATATAAGTTTAGGTATAACAAATGATTTAAATATAAGATACTGT ATTTCACATTGAGACGAAACAATCCACC

GAAAATCATAAAATATAAGAATGTTGCATTTTATTTTTAAAAATAAAGAT GCCTTTTAAGAGGAATAACTTAAATGTC

TTTAATACCTTTGAATTTAATTATATGGCTAATAAACACAAACTTAAAGC TTAAAACTGCATCGAATTGAATGCGGTT

ATAAATGTACTTATATATCTAATATAATCTGCTAATATGGTTTACATGGT ATATCTTTCTCGGAAATTTTTACAAAAA

TTATCTATTCATATATCTCGAGCGTAAGATATTTATCAGTTTATAGATAA CATCTTTAAATTTGGGTGATTAAAAAAA

AACATTG

个Cp36_PRR |果蝇| CP36 | FBgn0000359 | X:8324430..8324513

TCTAGAGATCTGGGCACGATGGCGAGACAAAGATGCGGCGCAAAATCGGA AATGGAGATGGATCACGTAGCCGGCCAT

GGCGG

个Him_distal |果蝇|他| FBgn0030900 | X:18039896..18043470

GGTTTTCTGCGATGGCTTCCGCGCCAGCTGAAGTATCTGATTTGCTGCCT TGTTTTTGTTGATATTTCTGCGAAGGGA

CTTGTGCTTTTCAAATGGCCTTTTTTTGGGATTACGGCAAGGGCGCGTTT CCCACGCTCGATCCCCACTTACCATTGG

TGCACGCGATTGCGGCAAGCTGCTGAGGCAAGCTATTAAACGCCACACTG GGCCGGGGGGCGGTACCGGTGGGCGTGG

CAGGGGAGTCGACACATGTTGTGTGCCAGAGAACTTTGCTCCGATCCCCA GATCATCAAATAGTTGTCGCTGTCTGCT

CGTGCGCAAATTGCAATACTTTGCATACCCTTACTGCAGGGTATCTGAGC TTGGACTTTAAATAAGGGGGTATAACAT

AGCTTATACTCTCTATCTCTGTTATAAAGTCAATTTTCCTTAGATCTTTA GTACAGTGGGTAGTTAAGGAGACATAAC

TTCCAAAAAAAAAAACTATAAAATTGCAATAATTTATGCAAAATATGTAT TTTATTGAATGGGATGAATAATTTACCT

TATACGACTGTAAAACATTTCTAACGATTAAATGCACTTCTAAAAGTTTT CCCACAAGTAGGTGAGCTATTATGCTAA

GCGTTCCATGACTTGGAATCTAAGATCTTGTTTTGATCTTCGCTGATCTT TGAGAACTCGGGGATTACTTACACATTT

CTGGGCAGGCACAAGTGGGCCGAGGCAGTGTAGATTCATCACGTTTTCAC TCAACACACGCAGCTCATTAACAGCCCC

GCTGACAACTTGTCAGGACTTCCCCCTCGTGAATCCCCCTGCTACGCAAC CCCCATTCCCCGCCCATTCCAACACTTC

CCGCCGGGAGCGTGGGAAATTATGCGTGTTGGTGGGACGTCGGGCGGTGA AAATTGGCGCGCTCTTCGGGGGGCCACA

CCGCGTGGCATTGACAACTCTTCCACATTTCGCGCCCAACGATGCGTTGG CATCAGTGGGTCACAGGGATTACGG CTG

GCTGGGATTCCAGAGCCAGATCTTTTTCAGCCAAAACTTTCAGCTTTCGA AGACCTCAAGCGATAGGAGAGTGTCGGA

AGTCCAGAAATAGACGCGTAGCACATAAATTATGGATCGTATCGAGTATC GATTAGCCCGGGACAAGCGAAGCGATAG

GGAGACATATTTTTATTACCCTCTCGGGGACCTGCACTTGTTGGCTTCGC TTCTATGAAAGATCCCTCTACCATATCA

CGTATGTGGGCTCCCCCAATCGAACCGAGTTGTGGGAAATGTTTTCCCAG GCCAACAGCTAATTGTCACTCCAAGGGT

TGTCCCCGCAGCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGAT CTGAGTCAAGTGAAGGGCTTCAATTTCT

TTCCCGAGTGGAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTA GTCCGCAATCCTCAGCTAATGGGACTTA

CGAACATATATTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGG GAAGTCGCACACGCGCAAGTCAGGCGCT

CAAAAAGGGATCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAA TATAAATAAAATAATATTTAGCTCTATG

TGTTTATATAATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAA AAATACATCTTATATATCCCTATAATAA

GAAATAAATAATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGT ATTATTACCTCTTTTTTTTTGGTTGGTT

CTTTTTTGATGTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGC ATTGCCTTTCCCC ATCGGGGGATTCTAA

TTCCGTGGACGATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGT TTAATTATGGCCCATCTTGCATCTTGCA

CCGATGTGGATGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCG TTATCTGAGCATTTTGTACGCTCCACTC

CCTCTTCCCCCCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAG ATATTCCCAAGCGGCCAAAAATAGACGC

AAATTGTAACGCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATA AAATAGCAGAGAGACCCACAATAATATA

CGTTGATATACACATGTATATATGTATGTATGTACATAAAGGGCCAGGAG CAGGAACGTTAGGCATGCGGTGGTACGA

GCACCGTGGTGCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGA GTGGGTTGCATTGCGCACACAGAACATG

TGAATGCAGAGTTCAAGTGCATGCCGTGACACAGACACGCACACACACAC ACGCACACACAGATGAGTAGCCGCTGCA

AAGTGTTTTTTCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCC GATCCGATCCAATCCAATCCGATTGGAT

CCCATCTTGCGGCACTACGATTATGACGCTCGACACGATGATGCATTCGC AGAGTTTCCCGATCGCAGAGTACCCTGT

ACTCGAGTAGTTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGT ATAATATTCCATTATATTAATATTTTT

ATAGCACTAAAGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATA C TTAACCATAGAAACTTATGATATGATA

CCAATATTTAAGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGA AAATATTAATTTTCAAAATTGATATTCA

AGAGATATTATAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATG CTTTCTAATGAGTATAGTATACCCCTGC

TACCCTGTCAATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGC AGACTGCCACGGGAAAAATTCGGTTCGA

GATTTGGGAATGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTC GGATTTCGGAATGGATATGGAAATGAAG

ATGGAAATGGGACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATG CCGCTGGATGTTGCATGTGGCAGCGGTC

GGTGCAGCAGCGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGG CGATTGTGCGGCGCTGGTGCTGCCACAT

GTGTTCTGTGTTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAG AATTGACTCCACTTGAGCAATGTCCCAT

AAAGCGGGAGTTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAA CAAAAGAAAAAAAAAAAAAAAAAACACA

GCCAGTAACACATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAA GAGTCGATCTCCAAAACAAACCCGCAGA

GAGCACATATAAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCG CCGCAGCTCGACGCGCTCGCATATCGGG

AATATATAGATCGGAGATATCGCAGGACCCACAGCAGAGCA GAGCCGCAG AGCCACCAACCTCG

个Him_proximal |果蝇|他| FBgn0030900 | X:18041232..18043470

GCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGATCTGAGTCAAG TGAAGGGCTTCAATTTCTTTCCCGAGTG

GAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTAGTCCGCAATC CTCAGCTAATGGGACTTACGAACATATA

TTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGGGAAGTCGCAC ACGCGCAAGTCAGGCGCTCAAAAAGGGA

TCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAATATAAATAAA ATAATATTTAGCTCTATGTGTTTATATA

ATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAAAAATACATCT TATATATCCCTATAATAAGAAATAAATA

ATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGTATTATTACCT CTTTTTTGTTGGTTGGTTCTTTTTTGAT

GTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGCATTGCCTTTC CCCATCGGGGGATTCTAATTCCGTGGAC

GATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGTTTAATTATGG CCCATCTTGCATCTTGCACCGATGTGGA

TGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCGTTATCTGAGC ATTTTGTACGCTCCACTCCCTCTTCCCC

CCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAGATATTC CCAA GCGGCCAAAAATAGACGCAAATTGTAAC

GCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATAAAATAGCAGA GAGACCCACAATAATATACGTTGATATA

CACATGTATATATGTATGTATGTACATAAAGGGCCAGGAGCAGGAACGTT AGGCATGCGGTGGTACGAGCACCGTGGT

GCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGAGTGGGTTGCA TTGCGCACACAGAACATGTGAATGCAGA

GTTCAAGTGCATGCCGTGACACAGACACGCACACACACACACGCACACAC AGATGAGTAGCCGCTGCAAAGTGTTTTT

TCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCCGATCCGATCC AATCCAATCCGATTGGATCCCATCTTGC

GGCACTACGATTATGACGCTCGACACGATGATGCATTCGCAGAGTTTCCC GATCGCAGAGTACCCTGTACTCGAGTAG

TTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGTATAATATTCC ATTATATTAAATATTTTTATAGCACTAA

AGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATACTTAACCATA GAAACTTATGATATGATACCAATATTTA

AGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGAAAATATTAAT TTTCAAAATTGATATTCAAGAGATATTA

TAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATGCTTTCTAATG AGTATAGTATACCCCTGCTACCCTGTCA

ATCCGCAAAACAGGCGCCGAAACATGCGGTTTCT CGCAGCAGACTGCCAC GGGAAAAATTCGGTTCGAGATTTGGGAA

TGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTCGGATTTCGGA ATGGATATGGAAATGAAGATGGAAATGG

GACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATGCCGCTGGATG TTGCATGTGGCAGCGGTCGGTGCAGCAG

CGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGGCGATTGTGCG GCGCTGGTGCTGCCACATGTGTTCTGTG

TTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAGAATTGACTCC ACTTGAGCAATGTCCCATAAAGCGGGAG

TTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAACAAAAGAAAA AAAAAAAAAAAAAACACAGCCAGTAACA

CATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAAGAGTCGATCT CCAAAACAAACCCGCAGAGAGCACATAT

AAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCGCCGCAGCTCG ACGCGCTCGCATATCGGGAATATATAGA

TCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAGAGCCACCAAC CTCG

> Obp18a_prom |果蝇| Obp18a | FBgn0030985 | X:18969778..189727 46

ATGGCGAAAATCTGTTTCCCAACTAACAATGAGCGCATCATCACAGCTCT ATATATATAACCCATCGATTTGCTAATT

CAGCTCAAAAGTAGACAGGAGATTTTAATTAAATAATTGGATGCT ACTTT ACATTCGCCACACACCAACAAATAAAGT

CTATAATTGAAATTTTAAGCGCAGTTCCCGATTATGAGCTACACGTATGT CGTATGCGCAATATCTGCATTACAATTG

CCAATAGTAAATTACCAACTTGGTTTTCTTCATATTTATTAAGATAGAAA ACATACAATTTTTGGCTTTTACACTCCA

AGCATCTCTGAAGTTTAAACAAAAAACATATGTGTAGCCTATCTACTGTA TTGGACTTTATTCGTATATTTTATATGG

TTCATTAATATAGGTATAAATACAAATTATATTCACGCTTTGCGATTTGC AGCGAATATCACATCTTATACACGATGT

AAAAAAAAAAAAAATATTTCGTCATGTTTTTAGGTTGGCCGCAGGCAGTG CTCACTGTACCGCCACAATGTTTATCGT

TTTGCATTTTTTTTTTCTTTGTTTTCTTGCGGTTTCCCCTAATTATCTTT AGTATAAACTTAGTCTACTGTCTTTTTT

GGTAAGTATTTTCGTGATGGGCTCGTCTATGCGAATTCCCATTTCCAATG AATAAATAAAGTAATTAGAACATTAAAA

TTAGCAATAAAACACGTACATTTAAAGCTGACAACAAAAAAAAAAAGTAT TCTTATGTTAAACTGTAGTATGTGCCTA

TGCAATATTAAGAACAATTAAATAAAATAGCATATTAACTTATGGCAGCA CTTTGTTGCTATGTTTATGTTTATGTTT

ATGCACGCAGTTAGGCCAGGGCGGATGTAACATGATCACCCACTCGAAGG CAAAAAGTATAAGTGCATGGTCAGCATT

CACACGCCGACCAAATACATATTACATACGTAC ATACATATCTCGCTCTC CCGATAAGCCTAGATATATAAGATATAC

ATAAGAACGCCGCTCCGCTGCTGGCGTACCCGGCAGCGCAGCTACGCGGA TTAGCCTAAGTCCAAATATATTAAAAAC

TGTAAAATCAGAGAGACTCTGTAGACGTTGAGCTGACAGAACCATTTCTG C​​CTACTCTAAAATCAAAAGAAGAAATTG

AATAAATATATGTCAGCCCGACGGCTGCCTTCAACTTAAAACGGACTTGT GTTCTGAATTGGAGTTCATCATTACATG

GCGACCGTGACAGTCGTCCAACGCTGGACGAATTGACCAAAGCTGGTGAA AACAAAGGAACAAAGGAACACTGGACTG

GAAGAAGACTGGACTAATTAAATGGAACTGCAAAAACCAAGGAAAAATCT GAGTGAGTAGAGTTCTATTGAGTATGGG

CAAACACCGTGGCGGTTTGAAAACTAAGCTGAATAAACGTATAGCCCACG TAAGGTGGCTAATATACGGTCAGCAAAC

GCCACCGGTTTGGTCGAAAGCTCTAAAGCTACATGCAGAGCTAGACCACT TGTTGCAATATCAGCAAGAATTAAAGAC

CCATAAGCTCGAGAAAACTCACTCAGATAATATTAAAAATATACCCACAA TTAATGAAGTTCCAAAATACCAGGCATG

TCCAGCACCAGCACCAGCATTAACAAAACCAAAGAAGTCCTGCCCCCCTG GCTGCGAAGGAATCTGGAGTCCCCACTG

CCTGGGGACTTGTGAGCGACCATCGACGTCTTCAGCGGCGAAGAAATAGA CAGCAGCGAGGGAGTGTCAGCGTGCCAC

CCCCGGCGACGCCCAGCTGAC ACCTGATGAGCATCATCAACAGCAGAATA TAATAATAAATATATATAAATATAAAGT

AAATATAAAATATATATAGATAAGAAAAATTGTAAGAAATATTGTAAAAC GGAGCATATACTATTATGCCCTGTTAAC

CCAATATGGCCCGTGAAGCCATAGCTAGAATCAGGCAGGCAACAATGTAA AATACAATTTTTTTTTACTCTTGCGAAC

ATTGAAAGATTTTATAAATAGATAATTCCAAACATAAATGTCTATAGAGA CAAATGAAATAAGTAAAACTGAAAATAA

AAGTATATACAAAGGAAATTTTCTATTCTATTCTCCAAAATATAAAATTA GTATACCCAAAATGGGTCTAATAGACAC

TAAAACTGTGGACTCTACAGCCAATGTAATAAATAAAGTAGAAGTCCAAA ATGCAGACTTGTTCTGGATAACCATAAT

ACTAATTGTAATTGCATTAATTATGGTATCCAATGCATTAATAAAAATAT ACAAACTGCATAACAAGTGTCTTAAGAA

ACGATACCGTAGCACTGCTAACGGTATAGATAATATTTAAGGAAGATCTT TAATAAAGTCAATTATGAATGAAAATAT

GAGAAAAATTATATGAAAAAAAAAAAATAATAAATAAAAAAAAAAATATA AAACGTAATATTGAATTTATCTACGTTA

AAAAAAAAAATATATACAAATGAATAAATTTGAAGTTATGAGTATACCAC AGCATGGACTGGGAAAAGCTTGTTGATC

AGATAAAAGATCAAAATGAAAATTTCAGAAAATCCTATAAGTGCTTAACG CAAAACAGATCAACACAAGCTGTAACAA

TCAATAGGA ATGCCCAAGTCTTGGTAAATAGTTATAATGAAATCAGAGAG TTGATCCAACAAAATAGAAAGAATTTGG

AACGCAAACAGTGTGCTAAGGCTTTGAACCTACTGGTGACATTAAGAGAA AAATTAATATTTATAAAAAATAAATTCA

GTCTCCAGATAGAAATTCCAACCATAGTAAACACCCCACTAAGAATAAAT TTGAATGAAGACAGCACTAACTCTGACG

AGGAAGATAGGACTATAGTCAAGGAAGACATTAAAGAGGAAGATCTTCAC GATCTAACTATACCAGCAAAATTAATGC

TGAA

个; Obp19a_prom |果蝇| Obp19a | FBgn0031109 | X:20223943..202264 46

CCACCTGCGAAATGGGTCATAGTATATGTATTTGTAAAAAATGTATGTAA AAAAATGTTAAATTAATAATTTTGAATT

TCAATTTGGAGCTGAAAATAATATTTTGTGTCCATCAACAGCTCCAAAGC GATGGTTCATTTTATCTTGTGTGCGTTC

AATAGAATCACTCTTACGTTAGCGCGTCCATTGATGGTTGTCCCATTGAA GTACTTCTTAAAGCCGTCGGCCATTGCT

ACTGGACTGGATCTGGAGATCTGGAGATCTGGATTTGGGGTCGGGTCCGG GTGAGAGCTGAGTGTGTTCTGCCTATAG

CTCCGAGCGAGAACCTAATGACAAGCAGCGAAGTGCAAAGCTCGGCCAAC TAGATTACAAAGTCGATTCATTGGCAGG

ATTCGATTTTTATTGACTCAACGAGGTGGTACATGAGTTTGGTCCCCAAG CCTTTAACTGTGGCATCGAG GACCGGAA

AGGGGGTGCTGATTATAAATAGTTATGGATTGCTGACGGGTCGAATGGGT CGGAGCGGTGGGGAGCCATGACTTCAAT

GATTTGGCAGCATCGGCGCCCTAGCCATGGAGCATGGCCTGCTGGCAGCC CTTGCAGTAGAGCTTGGTCTCGCGCCGC

TTCGTGTTGCGGCGGTGCATCTTGACCAGGACGTAGACGAGTCCCAACGA GGCCCAGGTGGCCTTGGCTACCTGTGGG

TTTCGGTGGCGTATTTGGGCGCATCTTGTGTACTGCCGTGTACTGAATCA CTTACATTGGCGCGACCACGCATGGTCT

GGCTGTTGAAGGCTTCGTTGAAGTTGAAATGATCGGACATCTTTGGATCG TTGTTGACCGGATTGGCGTGGCTTTTAA

CAAAAGATTAAAATTTGGATTCGATATTCGACCTGTATTTTAGACCGGGA TTCGGATTGTGACTTTTAAACGTTCGAA

ATGAAAGGAATGTTACTGACAGTCGTCAAAGCCGACTCGGGTTTCCCAAC TAGAGAGAATGCTGAAGTCTAGTACCGA

CTAATGGGATACCCATTAATTACTGCTTAAATACTGTGATGAAAATTGAG ATATGCAAGAGGCAAATCGAAAGTTTTG

GACATTTTCATATTGTACCTTTAACCAACTTCAGAATTCATTGAGCTAAA TACCATTTACAATTTTATGAAATTTTTA

AGCATGTTACAGCTATAACTATTTTTAAACCAGTTACTAGATTCGTTGAA ATATTGGTCGGAATTAGGATCACTAGCCAAGCCGATATGGCTATGTCTGT AAGTCTTGGAATCTGATATT

AACATCGCATATCGATCGACCATTATATATCTAATATATCCTCTACAAAT GTATTTTATCACCTAGCTAGCATGTAAA

CATTCTGGCCTATTTAGCTGTACGCTTCAGTTATGCTAATGCAAACATAA GCCTTTTGTGATATTATAATTTACATTT

ATTATTTATTGCAGTTAGCTTTATCAGCGATTTGGGCTCATGCCACACGC AATACTACTTATTTCAACGTCATCAGTT

GTACTAAATGCACAAATGAAATACATTTCGCCAAATAAATGCCAACTTGC AACTAATTTGAATGCTAATCAAACCGAA

CTACTCATTTGCATACAAGGTAATAGGTGGTTAAAGTGAGTGTAATGGAC TTACTTAAGGGGTTACAAGGCTTATATT

TAAAATGCCTGCCTTGTAATTAAATTTTTAAATATATTGGAAAAAAATGG CCACTTGTTATGTGAGTCTCCAGAAAAA

AAACAAAAAAACAGCAACCATCTGGTATGCAAAATATCTGGTGGTAGCAA AATATCTGGTGGTATCTGGTGGACTATC

AAAATATAAAAACTTTTTTTTCCAGATAGTATATCTTAAAATCAGCATCT TGAAGGAGTATATGTAAATAGCAAACTA

TTTGTAAAAATAGATTTTATTTTATAATTTTTTAAGATATATACCAAACA TTATTACCGATTGTGATTATCTTTACAT

TGTTTGACCTCAAAACGGAAAACTGGATGCGCGGTATCCATGCGACCCTA ACTCTGGAACCGATTTTGGAACCGCCCC

GTTAGATCTCAGATTGAAACCTTATTTGCATTCGCATGATCGCTGATG AA CACTGGGGAAATGCGGCCCAGCAATGGG

ATTGTCAACGCATCTCGGCCAGAATCGCGCCTCGCATGCCACCTCGCACG GTGACCACATACCTGTGTACACTGTCAA

TTAACGTGGCAAGATTATAGCCCGGCCAGAAAGTAATCCGCCCCAGGAAC ACCACCCACCGCCCGCCCATTTGGATAT

GGAAATGGGCAGTGGGGGCGGCGATTGGCGCTAACCCATAATTCCCACAC CCACTTAGCGGTTCGATCGAACCAATAT

GAAGTCATTTGCATGTCGGGGGCCGTGTATAAAAGGAGTCGCCGATGGGT CTGGAGTCTGGAATCCGCCAAATCGTCT

CGGAAAT

个Obp19b_prom |果蝇| Obp19b | FBgn0031110 | X:20224439..202274 40

ATTGCTGACGGGTCGAATGGGTCGGAGCGGTGGGGAGCCATGACTTCAAT GATTTGGCAGCATCGGCGCCCTAGCCAT

GGAGCATGGCCTGCTGGCAGCCCTTGCAGTAGAGCTTGGTCTCGCGCCGC TTCGTGTTGCGGCGGTGCATCTTGACCA

GGACGTAGACGAGTCCCAACGAGGCCCAGGTGGCCTTGGCTACCTGTGGG TTTCGGTGGCGTATTTGGGCGCATCTTG

TGTACTGCCGTGTACTGAATCACTTACATTGGCGCGACCACGCATGGTCT GGCTGTTGAAGGCTTCGTTGAAGTTGAA

ATGATCGGACATCTTTGGATCGTTGTTGACCGGATTGGCGTGGCTTTTAA CAAAAGATTAAAATTTGGATTCGATATT

CGACCTGTATTTTAGA CCGGGATTCGGATTGTGACTTTTAAACGTTCGAA ATGAAAGGAATGTTACTGACAGTCGTCA

AAGCCGACTCGGGTTTCCCAACTAGAGAGAATGCTGAAGTCTAGTACCGA CTAATGGGATACCCATTAATTACTGCTT

AAATACTGTGATGAAAATTGAGATATGCAAGAGGCAAATCGAAAGTTTTG GACATTTTCATATTGTACCTTTAACCAA

CTTCAGAATTCATTGAGCTAAATACCATTTACAATTTTATGAAATTTTTA AGCATGTTACAGCTATAACTATTTTTAA

ACCAGTTACTAGATTCGTTGAAAATTGTATGTCACACAGAACTTCTTGCC ATCCTGGTCGGAATTAGGATCACTAGCC

AAGCCGATATGGCTATGTCTGTCCGTATGAAAGTCTTGGAATCTGATATT AACATCGCATATCGATCGACCATTATAT

ATCTAATATATCCTCTACAAATGTATTTTATCACCTAGCTAGCATGTAAA CATTCTGGCCTATTTAGCTGTACGCTTC

AGTTATGCTAATGCAAACATAAGCCTTTTGTGATATTATAATTTACATTT ATTATTTATTGCAGTTAGCTTTATCAGC

GATTTGGGCTCATGCCACACGCAATACTACTTATTTCAACGTCATCAGTT GTACTAAATGCACAAATGAAATACATTT

CGCCAAATAAATGCCAACTTGCAACTAATTTGAATGCTAATCAAACCGAA CTACTCATTTGCATACAAGGTAATAGGT

GGTTAAAGTGAGTGTAATGGACTTACTTAAGGGGTTACAAGGCTTATATT TAAAATGCCTGCCTTGTAATAATTTT

TAAA TATATTGGAAAAAAATGGCCACTTGTTATGTGAGTCTCCAGAAAAA AAACAAAAAAACAGCAACCATCTGGTAT

GCAAAATATCTGGTGGTAGCAAAATATCTGGTGGTATCTGGTGGACTATC AAAATATAAAAACTTTTTTTTCCAGATA

GTATATCTTAAAATCAGCATCTTGAAGGAGTATATGTAAATAGCAAACTA TTTGTAAAAATAGATTTTATTTTATAAT

TTTTTAAGATATATACCAAACATTATTACCGATTGTGATTATCTTTACAT TGTTTGACCTCAAAACGGAAAACTGGAT

GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG

CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC

GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA

"我如何列出各个序列。新序列以>开头。 symbol.so我想把它们放在单独的lists.and我不能指定整个文件中的序列数量..我该怎么办?



我的序列文件:> CG9571_O-E |果蝇| CG9571 | FBgn0031086 | X:19926374..199271 33

CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG

GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT

.........................

GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG

CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC

GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA

"我怎样才能使各个sequences.here新序列开始与"的列表;>" symbol.so我想把它们放在单独的lists.and我不能指定整个文件中的序列数量..我该怎么办?



Python非常适合这种东西:

展开 | 选择 | Wrap | 行号


hey!
I have a program that takes two input files(one in the matrix form) and one in the sequence form.Now my problem is that i have to give the matrix file(containing many matrices) and sequence file containing many sequences and calculate the same log score as I did for one matrix file and one sequence file.
how it should exactly work is that. for every sequence it should calculate log values for all the weight matrices,then go to the second sequence and calculate all the log values using the matrices.
my matrix file is huge containing so many matrices. a part of it is here.

//
NA Abd-B
PO A C G T
01 10.19 0.00 10.65 6.24
02 5.79 0.67 10.50 10.11
03 4.50 0.00 0.00 22.57
04 0.00 0.00 0.00 27.08
05 0.00 0.00 0.00 27.08
06 0.00 0.00 0.00 27.08
07 27.08 0.00 0.00 0.00
08 0.00 2.83 0.00 24.25
09 0.00 0.00 24.45 2.62
10 19.33 0.00 4.34 3.41
11 0.31 12.28 3.39 11.09
//
//
NA Adf1
PO A C G T
01 0.71 0.08 26.02 1.55
02 3.03 23.00 1.24 1.09
03 0.26 10.50 3.29 14.31
04 0.00 0.06 28.23 0.07
05 0.12 27.27 0.06 0.91
06 1.44 20.36 0.37 6.19
07 5.35 0.28 21.49 1.24
08 7.81 16.10 3.81 0.63
09 0.51 17.77 0.45 9.63
10 0.00 0.14 28.21 0.00
11 0.00 25.69 0.20 2.46
12 0.48 9.98 0.07 17.82
13 1.27 0.00 27.01 0.07
14 15.59 7.98 2.92 1.87
15 4.28 22.37 0.00 1.70
16 0.18 0.77 22.70 4.70
//
//
NA Aef1
PO A C G T
01 0.00 0.06 12.49 0.00
02 3.80 0.17 0.00 8.57
03 0.87 0.06 0.00 11.62
04 0.06 9.76 2.32 0.41
05 9.82 0.00 2.73 0.00
06 9.76 0.00 0.00 2.78
07 3.80 0.31 0.00 8.43
08 0.00 0.00 0.00 12.54
09 0.00 6.53 5.85 0.17
10 0.00 12.38 0.17 0.00
11 2.73 1.02 8.80 0.00
12 5.85 0.00 6.70 0.00
13 1.02 5.96 0.00 5.57
14 0.00 5.16 4.66 2.73
15 1.03 7.55 3.97 0.00
16 4.82 5.00 2.73 0.00
//
//
NA Antp
PO A C G T
01 5.52 14.49 27.56 0.49
02 8.17 14.02 11.42 14.47
03 18.18 27.29 1.31 1.29
04 40.26 5.66 1.83 0.32
05 19.05 12.67 0.43 15.91
06 9.94 0.07 0.20 37.86
07 26.63 15.17 0.00 6.27
08 47.45 0.06 0.00 0.56
09 0.81 0.48 0.00 46.79
10 26.46 19.05 1.81 0.75
11 48.07 0.00 0.00 0.00
12 30.51 0.00 0.00 17.56
13 43.45 0.00 0.00 4.62
14 30.06 5.98 0.00 12.03
15 0.38 0.64 0.00 47.05
16 22.14 0.29 7.15 18.49
//
//

the sequence file is here( I mean this is also a part of my file)the actual file starts from "CC" the line before is just heading which we omit and this file is containg two sequences.
>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCC
>Cp36_DRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATA
this is my code which works(prints the log value for one sequence and one matrix)

Expand|Select|Wrap|Line Numbers

解决方案

Start with a list of input files (hopefully, generated automatically when there are many). Like:

Expand|Select|Wrap|Line Numbers


my sequence file:>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
CTACGGGAACGGGAGTCGCAAACGTTTTCGGATTAGCGCTGGACTAGCGG TTTCTAAATTGGATTATTTCTACCTGAC
CCTGGAGCCATCGTCCTCGTCCTCCGTCCCTTAGCGCCTCCTGCATGGAT GTCGTTTTTGGGTTTCATACCTTTTCAC
ACTGGAAAAATACGGAATTTGTTGTAAGCCCTTTCAAGACGAATGGGATT TAGCTTCGGATGTCAACGTCACCATAAT
CATATTAGGAATATTTCTACTCAATTGCAATATTGGTACTTTTCTGACTG TAAACGCGATGATAATTACAAATATGCC
TAATTTGCTGTCTTTATAATCAAATGGAGTTCTTTATATTTCCAAAATAT TGAAATTCCGATTCCCTAGAAAATAATA
CGTTTTTCTGTTATTAATAAAAAACCAATAGGAAAGTTCTCAAAAATTAC TCTGTTGTATTTGATCATTTCTTTTCCG
GTATAATCTTTTATTTTAAGCATTCCCATGTGAATAAATTTCAGACTAAT GTATTAATAAGATGTCGTGTTTTTCCAC
TTACAAATTTCTCATACAGCTGGATATATACTACGAGTACTATACACATG CTCTGGG
>Cp36_DRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8323349..8324136
AGTCGACCAGCACGAGATCTCACCTACCTTCTTTATAAGCGGGGTCTCTA GAAGCTAAATCCATGTCCACGTCAAACC
AAAGACTTGCGGTCTCCAGACCATTGAGTTCTATAAATGGGACTGAGCCA CACCATACACCACACACCACACATACAC
ACACGCCAACACATTACACACAACACGAACTACACAAACACTGAGATTAA GGAAATTATTAAAAAAAATAATAAAATT
AATACAAAAAAAATATATATATATACAAAAATTTGTTGTGTTTGAATTGA ATTAAGAGCTTATCAAGAAAAAAATTTC
AGTGACTCATAATACACTACTCTACAAGTTTAAATTGAATCAACAATTTA ACTTTCATTGCTCAGGTTTTTAGTAACA
ATGTTTATATAAGTTTAGGTATAACAAATGATTTAAATATAAGATACTGT ATTTCACATTGAGACGAAACAATCCACC
GAAAATCATAAAATATAAGAATGTTGCATTTTATTTTTAAAAATAAAGAT GCCTTTTAAGAGGAATAACTTAAATGTC
TTTAATACCTTTGAATTTAATTATATGGCTAATAAACACAAACTTAAAGC TTAAAACTGCATCGAATTGAATGCGGTT
ATAAATGTACTTATATATCTAATATAATCTGCTAATATGGTTTACATGGT ATATCTTTCTCGGAAATTTTTACAAAAA
TTATCTATTCATATATCTCGAGCGTAAGATATTTATCAGTTTATAGATAA CATCTTTAAATTTGGGTGATTAAAAAAA
AACATTG
>Cp36_PRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8324430..8324513
TCTAGAGATCTGGGCACGATGGCGAGACAAAGATGCGGCGCAAAATCGGA AATGGAGATGGATCACGTAGCCGGCCAT
GGCGG
>Him_distal|Drosophila melanogaster|Him|FBgn0030900|X:18039896..18043470
GGTTTTCTGCGATGGCTTCCGCGCCAGCTGAAGTATCTGATTTGCTGCCT TGTTTTTGTTGATATTTCTGCGAAGGGA
CTTGTGCTTTTCAAATGGCCTTTTTTTGGGATTACGGCAAGGGCGCGTTT CCCACGCTCGATCCCCACTTACCATTGG
TGCACGCGATTGCGGCAAGCTGCTGAGGCAAGCTATTAAACGCCACACTG GGCCGGGGGGCGGTACCGGTGGGCGTGG
CAGGGGAGTCGACACATGTTGTGTGCCAGAGAACTTTGCTCCGATCCCCA GATCATCAAATAGTTGTCGCTGTCTGCT
CGTGCGCAAATTGCAATACTTTGCATACCCTTACTGCAGGGTATCTGAGC TTGGACTTTAAATAAGGGGGTATAACAT
AGCTTATACTCTCTATCTCTGTTATAAAGTCAATTTTCCTTAGATCTTTA GTACAGTGGGTAGTTAAGGAGACATAAC
TTCCAAAAAAAAAAACTATAAAATTGCAATAATTTATGCAAAATATGTAT TTTATTGAATGGGATGAATAATTTACCT
TATACGACTGTAAAACATTTCTAACGATTAAATGCACTTCTAAAAGTTTT CCCACAAGTAGGTGAGCTATTATGCTAA
GCGTTCCATGACTTGGAATCTAAGATCTTGTTTTGATCTTCGCTGATCTT TGAGAACTCGGGGATTACTTACACATTT
CTGGGCAGGCACAAGTGGGCCGAGGCAGTGTAGATTCATCACGTTTTCAC TCAACACACGCAGCTCATTAACAGCCCC
GCTGACAACTTGTCAGGACTTCCCCCTCGTGAATCCCCCTGCTACGCAAC CCCCATTCCCCGCCCATTCCAACACTTC
CCGCCGGGAGCGTGGGAAATTATGCGTGTTGGTGGGACGTCGGGCGGTGA AAATTGGCGCGCTCTTCGGGGGGCCACA
CCGCGTGGCATTGACAACTCTTCCACATTTCGCGCCCAACGATGCGTTGG CATCAGTGGGTCACAGGGATTACGGCTG
GCTGGGATTCCAGAGCCAGATCTTTTTCAGCCAAAACTTTCAGCTTTCGA AGACCTCAAGCGATAGGAGAGTGTCGGA
AGTCCAGAAATAGACGCGTAGCACATAAATTATGGATCGTATCGAGTATC GATTAGCCCGGGACAAGCGAAGCGATAG
GGAGACATATTTTTATTACCCTCTCGGGGACCTGCACTTGTTGGCTTCGC TTCTATGAAAGATCCCTCTACCATATCA
CGTATGTGGGCTCCCCCAATCGAACCGAGTTGTGGGAAATGTTTTCCCAG GCCAACAGCTAATTGTCACTCCAAGGGT
TGTCCCCGCAGCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGAT CTGAGTCAAGTGAAGGGCTTCAATTTCT
TTCCCGAGTGGAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTA GTCCGCAATCCTCAGCTAATGGGACTTA
CGAACATATATTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGG GAAGTCGCACACGCGCAAGTCAGGCGCT
CAAAAAGGGATCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAA TATAAATAAAATAATATTTAGCTCTATG
TGTTTATATAATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAA AAATACATCTTATATATCCCTATAATAA
GAAATAAATAATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGT ATTATTACCTCTTTTTTGTTGGTTGGTT
CTTTTTTGATGTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGC ATTGCCTTTCCCCATCGGGGGATTCTAA
TTCCGTGGACGATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGT TTAATTATGGCCCATCTTGCATCTTGCA
CCGATGTGGATGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCG TTATCTGAGCATTTTGTACGCTCCACTC
CCTCTTCCCCCCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAG ATATTCCCAAGCGGCCAAAAATAGACGC
AAATTGTAACGCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATA AAATAGCAGAGAGACCCACAATAATATA
CGTTGATATACACATGTATATATGTATGTATGTACATAAAGGGCCAGGAG CAGGAACGTTAGGCATGCGGTGGTACGA
GCACCGTGGTGCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGA GTGGGTTGCATTGCGCACACAGAACATG
TGAATGCAGAGTTCAAGTGCATGCCGTGACACAGACACGCACACACACAC ACGCACACACAGATGAGTAGCCGCTGCA
AAGTGTTTTTTCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCC GATCCGATCCAATCCAATCCGATTGGAT
CCCATCTTGCGGCACTACGATTATGACGCTCGACACGATGATGCATTCGC AGAGTTTCCCGATCGCAGAGTACCCTGT
ACTCGAGTAGTTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGT ATAATATTCCATTATATTAAATATTTTT
ATAGCACTAAAGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATA CTTAACCATAGAAACTTATGATATGATA
CCAATATTTAAGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGA AAATATTAATTTTCAAAATTGATATTCA
AGAGATATTATAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATG CTTTCTAATGAGTATAGTATACCCCTGC
TACCCTGTCAATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGC AGACTGCCACGGGAAAAATTCGGTTCGA
GATTTGGGAATGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTC GGATTTCGGAATGGATATGGAAATGAAG
ATGGAAATGGGACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATG CCGCTGGATGTTGCATGTGGCAGCGGTC
GGTGCAGCAGCGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGG CGATTGTGCGGCGCTGGTGCTGCCACAT
GTGTTCTGTGTTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAG AATTGACTCCACTTGAGCAATGTCCCAT
AAAGCGGGAGTTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAA CAAAAGAAAAAAAAAAAAAAAAAACACA
GCCAGTAACACATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAA GAGTCGATCTCCAAAACAAACCCGCAGA
GAGCACATATAAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCG CCGCAGCTCGACGCGCTCGCATATCGGG
AATATATAGATCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAG AGCCACCAACCTCG
>Him_proximal|Drosophila melanogaster|Him|FBgn0030900|X:18041232..18043470
GCCCAGACGACAGATAAGCGGGCAAGTGAAGCCCAGCGATCTGAGTCAAG TGAAGGGCTTCAATTTCTTTCCCGAGTG
GAACTGGGATATCGAAATTACATTTGTAACAGACGTTTTAGTCCGCAATC CTCAGCTAATGGGACTTACGAACATATA
TTCATCTGAAATTCAAGAACATGCGCACTTAAAGAGCAGGGAAGTCGCAC ACGCGCAAGTCAGGCGCTCAAAAAGGGA
TCTTCGGAGGTACAGTGGGCAAAAGACTGTAAATAAATAATATAAATAAA ATAATATTTAGCTCTATGTGTTTATATA
ATCTACAAAGTAGTTAACAAAAAATATAAAATGGATATAAAAATACATCT TATATATCCCTATAATAAGAAATAAATA
ATAATTTTAGTAAATTAATTTTGTTACACAAAGTACCTGTATTATTACCT CTTTTTTGTTGGTTGGTTCTTTTTTGAT
GTGGCCCCACTGTGCTCTCTTATCAGTGCGACAATCAGGCATTGCCTTTC CCCATCGGGGGATTCTAATTCCGTGGAC
GATGGGCCGAAACGCCTATAAAGTCGCTCATTAAAAATGTTTAATTATGG CCCATCTTGCATCTTGCACCGATGTGGA
TGGGGTTTGTCGGCAATGATTTACATTATAAAAATGCCCGTTATCTGAGC ATTTTGTACGCTCCACTCCCTCTTCCCC
CCTCCAAAAAAAAAAAAAACAGATATGTATATTCCCCGAGATATTCCCAA GCGGCCAAAAATAGACGCAAATTGTAAC
GCACTTGAAGTGCACTCTGAAACATCTTGAAGTCCAAATAAAATAGCAGA GAGACCCACAATAATATACGTTGATATA
CACATGTATATATGTATGTATGTACATAAAGGGCCAGGAGCAGGAACGTT AGGCATGCGGTGGTACGAGCACCGTGGT
GCGAGCGAGAGCGCTGTGCTGCCTGAGGGAGAGGTAGCGAGTGGGTTGCA TTGCGCACACAGAACATGTGAATGCAGA
GTTCAAGTGCATGCCGTGACACAGACACGCACACACACACACGCACACAC AGATGAGTAGCCGCTGCAAAGTGTTTTT
TCCCAGGCGCTATTTATAATATGCATCCCGTCGCCGATCCGATCCGATCC AATCCAATCCGATTGGATCCCATCTTGC
GGCACTACGATTATGACGCTCGACACGATGATGCATTCGCAGAGTTTCCC GATCGCAGAGTACCCTGTACTCGAGTAG
TTTTTAGATGCAGTATTATTAAGTAGAAAATTGTAACCGTATAATATTCC ATTATATTAAATATTTTTATAGCACTAA
AGAAATAAAAGCCCATTTTATAATTTATATTACAAAAATACTTAACCATA GAAACTTATGATATGATACCAATATTTA
AGTTCCAAAAAATGTAGAACATTTTTAAGTATATACTCGAAAATATTAAT TTTCAAAATTGATATTCAAGAGATATTA
TAAAAAGATCCCCATTCTAAATATCTAACATCATGCCATGCTTTCTAATG AGTATAGTATACCCCTGCTACCCTGTCA
ATCCGCAAAACAGGCGCCGAAACATGCGGTTTCTCGCAGCAGACTGCCAC GGGAAAAATTCGGTTCGAGATTTGGGAA
TGGATGTATGACGGAGCAGAAGGAGCAGGACCCGGATTTCGGATTTCGGA ATGGATATGGAAATGAAGATGGAAATGG
GACTTTGACTGCGCGACGGCCACATGCGCCGCTGGCGATGCCGCTGGATG TTGCATGTGGCAGCGGTCGGTGCAGCAG
CGAAAGTGTTGCAGCTGTATGAGAGGGTCTATTTTTGGGGCGATTGTGCG GCGCTGGTGCTGCCACATGTGTTCTGTG
TTGGGCTGCTAAAAGGCATTGTAATGAGAGCAGAAAATAGAATTGACTCC ACTTGAGCAATGTCCCATAAAGCGGGAG
TTTCGAGTTTGGCGCGCAATGTGCCGCACCAGCAAACGAACAAAAGAAAA AAAAAAAAAAAAAACACAGCCAGTAACA
CATGGGCCCACGAGTTATGTTTTATTTTTAATCCCACAAAGAGTCGATCT CCAAAACAAACCCGCAGAGAGCACATAT
AAAGAGACTCGGTGGACGAGTGGTTCGAAACAGTCTTCCGCCGCAGCTCG ACGCGCTCGCATATCGGGAATATATAGA
TCGGAGATATCGCAGGACCCACAGCAGAGCAGAGCCGCAGAGCCACCAAC CTCG
>Obp18a_prom|Drosophila melanogaster|Obp18a|FBgn0030985|X:18969778..189727 46
ATGGCGAAAATCTGTTTCCCAACTAACAATGAGCGCATCATCACAGCTCT ATATATATAACCCATCGATTTGCTAATT
CAGCTCAAAAGTAGACAGGAGATTTTAATTAAATAATTGGATGCTACTTT ACATTCGCCACACACCAACAAATAAAGT
CTATAATTGAAATTTTAAGCGCAGTTCCCGATTATGAGCTACACGTATGT CGTATGCGCAATATCTGCATTACAATTG
CCAATAGTAAATTACCAACTTGGTTTTCTTCATATTTATTAAGATAGAAA ACATACAATTTTTGGCTTTTACACTCCA
AGCATCTCTGAAGTTTAAACAAAAAACATATGTGTAGCCTATCTACTGTA TTGGACTTTATTCGTATATTTTATATGG
TTCATTAATATAGGTATAAATACAAATTATATTCACGCTTTGCGATTTGC AGCGAATATCACATCTTATACACGATGT
AAAAAAAAAAAAAATATTTCGTCATGTTTTTAGGTTGGCCGCAGGCAGTG CTCACTGTACCGCCACAATGTTTATCGT
TTTGCATTTTTTTTTTCTTTGTTTTCTTGCGGTTTCCCCTAATTATCTTT AGTATAAACTTAGTCTACTGTCTTTTTT
GGTAAGTATTTTCGTGATGGGCTCGTCTATGCGAATTCCCATTTCCAATG AATAAATAAAGTAATTAGAACATTAAAA
TTAGCAATAAAACACGTACATTTAAAGCTGACAACAAAAAAAAAAAGTAT TCTTATGTTAAACTGTAGTATGTGCCTA
TGCAATATTAAGAACAATTAAATAAAATAGCATATTAACTTATGGCAGCA CTTTGTTGCTATGTTTATGTTTATGTTT
ATGCACGCAGTTAGGCCAGGGCGGATGTAACATGATCACCCACTCGAAGG CAAAAAGTATAAGTGCATGGTCAGCATT
CACACGCCGACCAAATACATATTACATACGTACATACATATCTCGCTCTC CCGATAAGCCTAGATATATAAGATATAC
ATAAGAACGCCGCTCCGCTGCTGGCGTACCCGGCAGCGCAGCTACGCGGA TTAGCCTAAGTCCAAATATATTAAAAAC
TGTAAAATCAGAGAGACTCTGTAGACGTTGAGCTGACAGAACCATTTCTG CCTACTCTAAAATCAAAAGAAGAAATTG
AATAAATATATGTCAGCCCGACGGCTGCCTTCAACTTAAAACGGACTTGT GTTCTGAATTGGAGTTCATCATTACATG
GCGACCGTGACAGTCGTCCAACGCTGGACGAATTGACCAAAGCTGGTGAA AACAAAGGAACAAAGGAACACTGGACTG
GAAGAAGACTGGACTAATTAAATGGAACTGCAAAAACCAAGGAAAAATCT GAGTGAGTAGAGTTCTATTGAGTATGGG
CAAACACCGTGGCGGTTTGAAAACTAAGCTGAATAAACGTATAGCCCACG TAAGGTGGCTAATATACGGTCAGCAAAC
GCCACCGGTTTGGTCGAAAGCTCTAAAGCTACATGCAGAGCTAGACCACT TGTTGCAATATCAGCAAGAATTAAAGAC
CCATAAGCTCGAGAAAACTCACTCAGATAATATTAAAAATATACCCACAA TTAATGAAGTTCCAAAATACCAGGCATG
TCCAGCACCAGCACCAGCATTAACAAAACCAAAGAAGTCCTGCCCCCCTG GCTGCGAAGGAATCTGGAGTCCCCACTG
CCTGGGGACTTGTGAGCGACCATCGACGTCTTCAGCGGCGAAGAAATAGA CAGCAGCGAGGGAGTGTCAGCGTGCCAC
CCCCGGCGACGCCCAGCTGACACCTGATGAGCATCATCAACAGCAGAATA TAATAATAAATATATATAAATATAAAGT
AAATATAAAATATATATAGATAAGAAAAATTGTAAGAAATATTGTAAAAC GGAGCATATACTATTATGCCCTGTTAAC
CCAATATGGCCCGTGAAGCCATAGCTAGAATCAGGCAGGCAACAATGTAA AATACAATTTTTTTTTACTCTTGCGAAC
ATTGAAAGATTTTATAAATAGATAATTCCAAACATAAATGTCTATAGAGA CAAATGAAATAAGTAAAACTGAAAATAA
AAGTATATACAAAGGAAATTTTCTATTCTATTCTCCAAAATATAAAATTA GTATACCCAAAATGGGTCTAATAGACAC
TAAAACTGTGGACTCTACAGCCAATGTAATAAATAAAGTAGAAGTCCAAA ATGCAGACTTGTTCTGGATAACCATAAT
ACTAATTGTAATTGCATTAATTATGGTATCCAATGCATTAATAAAAATAT ACAAACTGCATAACAAGTGTCTTAAGAA
ACGATACCGTAGCACTGCTAACGGTATAGATAATATTTAAGGAAGATCTT TAATAAAGTCAATTATGAATGAAAATAT
GAGAAAAATTATATGAAAAAAAAAAAATAATAAATAAAAAAAAAAATATA AAACGTAATATTGAATTTATCTACGTTA
AAAAAAAAAATATATACAAATGAATAAATTTGAAGTTATGAGTATACCAC AGCATGGACTGGGAAAAGCTTGTTGATC
AGATAAAAGATCAAAATGAAAATTTCAGAAAATCCTATAAGTGCTTAACG CAAAACAGATCAACACAAGCTGTAACAA
TCAATAGGAATGCCCAAGTCTTGGTAAATAGTTATAATGAAATCAGAGAG TTGATCCAACAAAATAGAAAGAATTTGG
AACGCAAACAGTGTGCTAAGGCTTTGAACCTACTGGTGACATTAAGAGAA AAATTAATATTTATAAAAAATAAATTCA
GTCTCCAGATAGAAATTCCAACCATAGTAAACACCCCACTAAGAATAAAT TTGAATGAAGACAGCACTAACTCTGACG
AGGAAGATAGGACTATAGTCAAGGAAGACATTAAAGAGGAAGATCTTCAC GATCTAACTATACCAGCAAAATTAATGC
TGAA
>Obp19a_prom|Drosophila melanogaster|Obp19a|FBgn0031109|X:20223943..202264 46
CCACCTGCGAAATGGGTCATAGTATATGTATTTGTAAAAAATGTATGTAA AAAAATGTTAAATTAATAATTTTGAATT
TCAATTTGGAGCTGAAAATAATATTTTGTGTCCATCAACAGCTCCAAAGC GATGGTTCATTTTATCTTGTGTGCGTTC
AATAGAATCACTCTTACGTTAGCGCGTCCATTGATGGTTGTCCCATTGAA GTACTTCTTAAAGCCGTCGGCCATTGCT
ACTGGACTGGATCTGGAGATCTGGAGATCTGGATTTGGGGTCGGGTCCGG GTGAGAGCTGAGTGTGTTCTGCCTATAG
CTCCGAGCGAGAACCTAATGACAAGCAGCGAAGTGCAAAGCTCGGCCAAC TAGATTACAAAGTCGATTCATTGGCAGG
ATTCGATTTTTATTGACTCAACGAGGTGGTACATGAGTTTGGTCCCCAAG CCTTTAACTGTGGCATCGAGGACCGGAA
AGGGGGTGCTGATTATAAATAGTTATGGATTGCTGACGGGTCGAATGGGT CGGAGCGGTGGGGAGCCATGACTTCAAT
GATTTGGCAGCATCGGCGCCCTAGCCATGGAGCATGGCCTGCTGGCAGCC CTTGCAGTAGAGCTTGGTCTCGCGCCGC
TTCGTGTTGCGGCGGTGCATCTTGACCAGGACGTAGACGAGTCCCAACGA GGCCCAGGTGGCCTTGGCTACCTGTGGG
TTTCGGTGGCGTATTTGGGCGCATCTTGTGTACTGCCGTGTACTGAATCA CTTACATTGGCGCGACCACGCATGGTCT
GGCTGTTGAAGGCTTCGTTGAAGTTGAAATGATCGGACATCTTTGGATCG TTGTTGACCGGATTGGCGTGGCTTTTAA
CAAAAGATTAAAATTTGGATTCGATATTCGACCTGTATTTTAGACCGGGA TTCGGATTGTGACTTTTAAACGTTCGAA
ATGAAAGGAATGTTACTGACAGTCGTCAAAGCCGACTCGGGTTTCCCAAC TAGAGAGAATGCTGAAGTCTAGTACCGA
CTAATGGGATACCCATTAATTACTGCTTAAATACTGTGATGAAAATTGAG ATATGCAAGAGGCAAATCGAAAGTTTTG
GACATTTTCATATTGTACCTTTAACCAACTTCAGAATTCATTGAGCTAAA TACCATTTACAATTTTATGAAATTTTTA
AGCATGTTACAGCTATAACTATTTTTAAACCAGTTACTAGATTCGTTGAA AATTGTATGTCACACAGAACTTCTTGCC
ATCCTGGTCGGAATTAGGATCACTAGCCAAGCCGATATGGCTATGTCTGT CCGTATGAAAGTCTTGGAATCTGATATT
AACATCGCATATCGATCGACCATTATATATCTAATATATCCTCTACAAAT GTATTTTATCACCTAGCTAGCATGTAAA
CATTCTGGCCTATTTAGCTGTACGCTTCAGTTATGCTAATGCAAACATAA GCCTTTTGTGATATTATAATTTACATTT
ATTATTTATTGCAGTTAGCTTTATCAGCGATTTGGGCTCATGCCACACGC AATACTACTTATTTCAACGTCATCAGTT
GTACTAAATGCACAAATGAAATACATTTCGCCAAATAAATGCCAACTTGC AACTAATTTGAATGCTAATCAAACCGAA
CTACTCATTTGCATACAAGGTAATAGGTGGTTAAAGTGAGTGTAATGGAC TTACTTAAGGGGTTACAAGGCTTATATT
TAAAATGCCTGCCTTGTAATTAAATTTTTAAATATATTGGAAAAAAATGG CCACTTGTTATGTGAGTCTCCAGAAAAA
AAACAAAAAAACAGCAACCATCTGGTATGCAAAATATCTGGTGGTAGCAA AATATCTGGTGGTATCTGGTGGACTATC
AAAATATAAAAACTTTTTTTTCCAGATAGTATATCTTAAAATCAGCATCT TGAAGGAGTATATGTAAATAGCAAACTA
TTTGTAAAAATAGATTTTATTTTATAATTTTTTAAGATATATACCAAACA TTATTACCGATTGTGATTATCTTTACAT
TGTTTGACCTCAAAACGGAAAACTGGATGCGCGGTATCCATGCGACCCTA ACTCTGGAACCGATTTTGGAACCGCCCC
GTTAGATCTCAGATTGAAACCTTATTTGCATTCGCATGATCGCTGATGAA CACTGGGGAAATGCGGCCCAGCAATGGG
ATTGTCAACGCATCTCGGCCAGAATCGCGCCTCGCATGCCACCTCGCACG GTGACCACATACCTGTGTACACTGTCAA
TTAACGTGGCAAGATTATAGCCCGGCCAGAAAGTAATCCGCCCCAGGAAC ACCACCCACCGCCCGCCCATTTGGATAT
GGAAATGGGCAGTGGGGGCGGCGATTGGCGCTAACCCATAATTCCCACAC CCACTTAGCGGTTCGATCGAACCAATAT
GAAGTCATTTGCATGTCGGGGGCCGTGTATAAAAGGAGTCGCCGATGGGT CTGGAGTCTGGAATCCGCCAAATCGTCT
CGGAAAT
>Obp19b_prom|Drosophila melanogaster|Obp19b|FBgn0031110|X:20224439..202274 40
ATTGCTGACGGGTCGAATGGGTCGGAGCGGTGGGGAGCCATGACTTCAAT GATTTGGCAGCATCGGCGCCCTAGCCAT
GGAGCATGGCCTGCTGGCAGCCCTTGCAGTAGAGCTTGGTCTCGCGCCGC TTCGTGTTGCGGCGGTGCATCTTGACCA
GGACGTAGACGAGTCCCAACGAGGCCCAGGTGGCCTTGGCTACCTGTGGG TTTCGGTGGCGTATTTGGGCGCATCTTG
TGTACTGCCGTGTACTGAATCACTTACATTGGCGCGACCACGCATGGTCT GGCTGTTGAAGGCTTCGTTGAAGTTGAA
ATGATCGGACATCTTTGGATCGTTGTTGACCGGATTGGCGTGGCTTTTAA CAAAAGATTAAAATTTGGATTCGATATT
CGACCTGTATTTTAGACCGGGATTCGGATTGTGACTTTTAAACGTTCGAA ATGAAAGGAATGTTACTGACAGTCGTCA
AAGCCGACTCGGGTTTCCCAACTAGAGAGAATGCTGAAGTCTAGTACCGA CTAATGGGATACCCATTAATTACTGCTT
AAATACTGTGATGAAAATTGAGATATGCAAGAGGCAAATCGAAAGTTTTG GACATTTTCATATTGTACCTTTAACCAA
CTTCAGAATTCATTGAGCTAAATACCATTTACAATTTTATGAAATTTTTA AGCATGTTACAGCTATAACTATTTTTAA
ACCAGTTACTAGATTCGTTGAAAATTGTATGTCACACAGAACTTCTTGCC ATCCTGGTCGGAATTAGGATCACTAGCC
AAGCCGATATGGCTATGTCTGTCCGTATGAAAGTCTTGGAATCTGATATT AACATCGCATATCGATCGACCATTATAT
ATCTAATATATCCTCTACAAATGTATTTTATCACCTAGCTAGCATGTAAA CATTCTGGCCTATTTAGCTGTACGCTTC
AGTTATGCTAATGCAAACATAAGCCTTTTGTGATATTATAATTTACATTT ATTATTTATTGCAGTTAGCTTTATCAGC
GATTTGGGCTCATGCCACACGCAATACTACTTATTTCAACGTCATCAGTT GTACTAAATGCACAAATGAAATACATTT
CGCCAAATAAATGCCAACTTGCAACTAATTTGAATGCTAATCAAACCGAA CTACTCATTTGCATACAAGGTAATAGGT
GGTTAAAGTGAGTGTAATGGACTTACTTAAGGGGTTACAAGGCTTATATT TAAAATGCCTGCCTTGTAATTAAATTTT
TAAATATATTGGAAAAAAATGGCCACTTGTTATGTGAGTCTCCAGAAAAA AAACAAAAAAACAGCAACCATCTGGTAT
GCAAAATATCTGGTGGTAGCAAAATATCTGGTGGTATCTGGTGGACTATC AAAATATAAAAACTTTTTTTTCCAGATA
GTATATCTTAAAATCAGCATCTTGAAGGAGTATATGTAAATAGCAAACTA TTTGTAAAAATAGATTTTATTTTATAAT
TTTTTAAGATATATACCAAACATTATTACCGATTGTGATTATCTTTACAT TGTTTGACCTCAAAACGGAAAACTGGAT
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"how can i make a list of the individual sequences.here the new sequences start with">" symbol.so i want to put them in individual lists.and i cant specify the number of sequences in the entire file.. how can i do?"


my sequence file:>CG9571_O-E|Drosophila melanogaster|CG9571|FBgn0031086|X:19926374..199271 33
CCAGTCCACCGGCCGCCGATCTATTTATACGAGAGGAAGAGGCTGAACTC GAGGATTACCCGTGTATCCTGGGACGCG
GATTAGCGATCCATTCCCCTTTTAATCGCCGCGCAAACAGATTCATGAAA GCCTTCGGATTCATTCATTGATCCACAT
.........................
GCGCGGTATCCATGCGACCCTAACTCTGGAACCGATTTTGGAACCGCCCC GTTAGATCTCAGATTGAAACCTTATTTG
CATTCGCATGATCGCTGATGAACACTGGGGAAATGCGGCCCAGCAATGGG ATTGTCAACGCATCTCGGCCAGAATCGC
GCCTCGCATGCCACCTCGCACGGTGACCACATACCTGTGTACACTGTCAA TTAACGTGGCAAGATTATAGCCCGGCCA
"how can i make a list of the individual sequences.here the new sequences start with">" symbol.so i want to put them in individual lists.and i cant specify the number of sequences in the entire file.. how can i do?"

Python is great for this kind of stuff:

Expand|Select|Wrap|Line Numbers


这篇关于循环遍历包含一组文件的大文件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆