Био энтрез выдал неполную ошибку при получении последовательностей

Question

Био энтрез выдал неполную ошибку при получении последовательностей

Стремясь извлечь ген CO1 из genbank, я столкнулся со следующей ошибкой при использовании пакета biopython entrez:

Traceback (most recent call last):

  File "<ipython-input-3-8de3088e0909>", line 1, in <module>
    seq2 = fetchDb("nuccore", "CO1[Gene]", "xxx@xxxx", "fasta", "500", "CO1_db.txt")

  File "<ipython-input-1-58ad2e955094>", line 49, in fetchDb
    for line in fetch:

  File "/usr/lib/python3.5/http/client.py", line 478, in readinto
    return self._readinto_chunked(b)

  File "/usr/lib/python3.5/http/client.py", line 589, in _readinto_chunked
    raise IncompleteRead(bytes(b[0:total_bytes]))

IncompleteRead: IncompleteRead(2154 bytes read)

вот мой код:

def fetchDb(db, query, email, rettype, reMax, savefile):
    """
    this function implements Biopython Entrez API function to retrieve information from NCBI database

    db: database
    query: what term you are search for
    email: your email address
    rettype: the type of file you want to retrieve, for example: fasta
    n: number of queries you want to make

    special guideline:
        1. any series of more than 100 requests, do this at weekends

    returns a file with retrieved information saved
    """

    en_db = db
    en_query = query
    en_email = email
    en_rettype = rettype
    en_max = reMax

    Entrez.email = en_email
    accID = []
    db_dict = {}

    n = 0 # testing bug
    counter = 0 # testing bug

    while True:

        # request for search in NCBI


        search_hd = Entrez.esearch(db = en_db, term = en_query, retstart = n, retmax = en_max )
        record  = Entrez.read(search_hd)


        # fetch the records, if there is nothing in the id list, break the loop 
        if len(record["IdList"]) == 0:
            break

        fetch = Entrez.efetch(db="nuccore", id=record["IdList"], rettype = en_rettype)

        n += int(en_max)



        for line in fetch:


            line = line.strip()

            if line.startswith('>'):
                counter += 1
                accession = line.split(" ")[0]
                header = line
                accID.append(accession)
                db_dict[header] = ""        
                print("Total seq:{}".format(counter))

            else:

                db_dict[header] += line


        time.sleep(10)

    # save the dictionary to fasta format

    output = open(savefile, 'w')

    for k, v in db_dict.items():
        print(k, sep="", file=output)
        print(v, sep="", file=output)

    output.close()
    search_hd.close()
    fetch.close()

    return accID

запустить программу:

seq2 = fetchDb("nuccore", "CO1[Gene]", "xxxx@xxxx", "fasta", "500", "CO1_db.txt")'

Кажется, нет. 5402 душит программу, но я не знаю, как обойти эту проблему.

Я хотел бы получить некоторую помощь в этом.

Спасибо,

Xio

0

biopython genbank

Источник

user7392741 09 янв '18 в 18:53

0 ответов

Другие вопросы по тегам biopython genbank