Because it is a data bank format, EMBL files contain a lot of additional data compared to FASTA or PHYLIP files.
For instance, for each sequence, you can find:
- Identification and accession number;
- References information;
- Date;
- Organism species, classification;
- Sequence cellular localization
- etc.
One may have a need for these data, or may not.
Loading all information into memory may be memory expensive, that's why while reading a sequence file you'll be asked for the kind of information to retrieve.
This goal is achieved by a simple way:
An EMBL Options Dialog box appears where you can check the data you wan't to load:
|
The EMBL options dialog.
Additional data will be stored as a sequence atttribute, whose name is written in the second column of the table.
You can change this name if necessary.
First column represents the EMBL tag for each kind of data.