Описание тега protein-database
A file containing protein sequences together with corresponding metadata
Classical protein-databases are text files containing a large number of protein-sequences.
Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).
The standard fasta format looks like:
>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................
A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.