Introduction.
More complex tasks dealing with further data may be achieved, like reading
databanks native formats, manage sequences together with trees, display and
analyze various parameters.
In order to do so, the program uses a new way of manipulating trees implying a generalization of the
tree object called XTree.
A first part of this help try to explain what are XTrees and their relation
with XML.
A second part give help on the API that enable the user to deal with XTrees.
I) Extensible trees: XTrees
1) What are XTrees?
XTree stands for eXtensible Tree.
Traditional trees were made of nodes, leaves (nodes without children) and
branches (links between nodes): theses elements determine the topology
of the tree.
A few more parameters are needed to define a tree: branch length, bootstrap
values are the more commonly used.
With the rapidly growing amount of data available, more parameters may be
attached to a tree: Ka/Ks ratios, GC contents at nodes, sequences, taxonomic
information, cellular localization of the sequence, etc.
One may not predict all the parameters we will need to attach to a tree in the
years to come.
2) Dealing with attributes...
In a XTree, parameters are stored as attributes with their names and their
values.
Attributes are already found in sequence files: a sequence in a file usually contains a name, a 'core' sequence and some comment lines.
But some more complex file format, as those from the banks: GenBank, EMBL, SwissProt, etc., may contain additional data:
origine of the sequence, bibliography references, cellular localization, taxonomic information...
This is the same for trees: you can define attributes for each leaf in the tree, but to each node too.
Leaves names, branch lengths and bootstrap values are usual parameters that are expected to be found in a tree, but the user can add as many other attributes as necessary.
3) XTrees and XML.
XML stands for eXtensible Markup Language.
It allows to store data in an intuitive and simple way.
XML is particularly suitable to store Tree data and supports extension, i.e. addition of further data while keeping the format intact so that new files still can be read by old applications.
Hence there is a tight relation between XTrees and XML data format.
In a practical way, XTrees will be able to store all the information in a XML
file, and vice versa.
Since each leaf in a tree is a sequence, a tree file is no more than a sequence file with sequences arranged in a 'phylogenetic' way.
Nevertheless, 'classical' tree files do not go that far, since they only write the phylogenetic information and the sequences names.
Now with XML, it is possible to write as many data as needed, without making the whole file format obsolete.
4) Data IO: the tag definition.
Many XML Document Type Definition (DTD) have already been proposed in order to
store phylogenetic data.
The main dilemma with DTD is to define tag names for the current data to write.
Some tags are expected in a XML file containing tree data: branches lengths,
bootstrap values, leaves names, but also node, root).
In order to deal with such "variations", the algorithms used in XTrees need a
tag definition map, which links the names of the attributes and the
corresponding tag name.
The user may configure the tag definition
via a XMLParserPanel,
generally displayed in the read/save dialog box.
See
Tree files in XML for a description of the XML tree file format.
II) APIs that deal with XTrees.
1) The attributes editor.
The attributes editor contains a table and a toolbar.
The table has two columns:
- The names of each attributes,
- The value of the attributes.
A few actions can be achieved from the toolbar, like adding a new attribute or remove selected attributes.
Edition can also be performed directly by clicking in the cell to edit.
Note that some edition manipulations may be forbiden: you can't edit reserved attributes names (Name, Branch length and Bootstrap value).
Moreover, you can't edit length and bootstrap value whereas you have selected the option that enables you to.
1) Configuring the XML parser.
The XML parser is the place where you describe the correspondance between inner attributes and attributes that may be found in your XML file.
A double entry table and a toolbar make this task easy to perform.
The attributes editor and the parser are quite similar, except that the two column don't have the same meaning:
- In the attributes editor, the columns represent the attributes names and their corresponding values.
- In the XML parser, the columns represent the attributes names and their corresponding names in the file.
further notes:
- Attributes not found in the file will be ignored.
- Attributes found in the file but not specified in the table will be ignored too.
- You must specifiy a few attributes: root element names, nodes, lengths, etc.
These attributes are expected to be found in the file because they describes the topology or are frequently found.
See XML Tree Files for a description of the XML tree file format.
- The XML parser is very similar for sequences in the XML format.
See XML Sequence Files for a description of the XML sequence file format.