Био энтрез выдал неполную ошибку при получении последовательностей
Стремясь извлечь ген CO1 из genbank, я столкнулся со следующей ошибкой при использовании пакета biopython entrez:
Traceback (most recent call last):
File "<ipython-input-3-8de3088e0909>", line 1, in <module>
seq2 = fetchDb("nuccore", "CO1[Gene]", "xxx@xxxx", "fasta", "500", "CO1_db.txt")
File "<ipython-input-1-58ad2e955094>", line 49, in fetchDb
for line in fetch:
File "/usr/lib/python3.5/http/client.py", line 478, in readinto
return self._readinto_chunked(b)
File "/usr/lib/python3.5/http/client.py", line 589, in _readinto_chunked
raise IncompleteRead(bytes(b[0:total_bytes]))
IncompleteRead: IncompleteRead(2154 bytes read)
вот мой код:
def fetchDb(db, query, email, rettype, reMax, savefile):
"""
this function implements Biopython Entrez API function to retrieve information from NCBI database
db: database
query: what term you are search for
email: your email address
rettype: the type of file you want to retrieve, for example: fasta
n: number of queries you want to make
special guideline:
1. any series of more than 100 requests, do this at weekends
returns a file with retrieved information saved
"""
en_db = db
en_query = query
en_email = email
en_rettype = rettype
en_max = reMax
Entrez.email = en_email
accID = []
db_dict = {}
n = 0 # testing bug
counter = 0 # testing bug
while True:
# request for search in NCBI
search_hd = Entrez.esearch(db = en_db, term = en_query, retstart = n, retmax = en_max )
record = Entrez.read(search_hd)
# fetch the records, if there is nothing in the id list, break the loop
if len(record["IdList"]) == 0:
break
fetch = Entrez.efetch(db="nuccore", id=record["IdList"], rettype = en_rettype)
n += int(en_max)
for line in fetch:
line = line.strip()
if line.startswith('>'):
counter += 1
accession = line.split(" ")[0]
header = line
accID.append(accession)
db_dict[header] = ""
print("Total seq:{}".format(counter))
else:
db_dict[header] += line
time.sleep(10)
# save the dictionary to fasta format
output = open(savefile, 'w')
for k, v in db_dict.items():
print(k, sep="", file=output)
print(v, sep="", file=output)
output.close()
search_hd.close()
fetch.close()
return accID
запустить программу:
seq2 = fetchDb("nuccore", "CO1[Gene]", "xxxx@xxxx", "fasta", "500", "CO1_db.txt")'
Кажется, нет. 5402 душит программу, но я не знаю, как обойти эту проблему.
Я хотел бы получить некоторую помощь в этом.
Спасибо,
Xio