|
How to create a valid XML file for jSquid In this tutorial we want to explain how a valid input file for the jSquid Applet is created. You can find the corresponding XML Schema in the Download section. Here we will not directly cope fifth the Schema but rather show an example of a valid XML file, since it is more intuitive and easier to understand.
The first tag in the XML file is the 'jsquid' opening tag. The 'jsquid' element surrounds the input data and has the two optional attributes 'x' and 'y'. Those attributes do not describe the size of the Applet, rather they are used for the distribution of the nodes within the panel they will be displayed. Optimally, their value is approx. 300px smaller than the size of the Applet, but you can also ignore those attributes and default values will be applied.
<jsquid x="500" y="400">
The next element is the optional 'settings' element. Within this element you can define the initial graph appearance. The following optional attributes can be used (their data type is boolean): 'distributeOnInit' (determines whether the initial node distribution is applied), 'showMedusaStyle' (uses the same drawing style as the Medusa Applet, such as thinner edges, when set to true), 'showNodeLabels' (toggles the appearance of the nodes' labels), 'showEdgeLabels' (toggles the appearance of the edges' labels, which are the confidence values), 'showNodeSize' (indicates whether the size of a node should correspond to the number of its edges), 'showUniformNodes' (displays all nodes with the default shape and color, if set to true), 'showEdgeConf' (determines whether the strength or thickness of an edge corresponds to its confidence value), 'showGroupLabels' (toggles the appearance of the groups' labels).
<settings showNodeLabels="false" distributeOnInit="false"/>
The optional element 'confidenceSettings' is used to initially set values for the confidence grouping. Those values can be set within the Applet, too, so if you are confused about their meaning, just leave them out. The attribute 'connectionCutoff' is a threshold value for the interconnection of node, 'confidenceCutoff' is a threshold for the confidence of edges.
<confidenceSettings connectionCutoff="0.6" confidenceCutoff="0.4"/>
The optional 'legend' element is used for describing the meaning of nodes' shapes and colors. The element can have several 'legendItem' elements which consist of the three required attributes 'shape' (the id for a shape, currently 0 - 9), 'color' (in R,G,B notation) and 'name'(e.g. the name of a KEGG pathway).
<legend>
<legendItem shape="2" color="0,0,170" name="Arginine and proline metabolism"/>
<legendItem shape="3" color="255,255,0" name="Alanine and aspartate metabolism"/>
</legend>
The next element is the optional 'hyperlingStubs' element. It consists of several 'stub' elements, which represent an URL of one or two parts. The two attributes 'name' (the identifier for a link) and 'urlbegin' (the first part of the link's URL) are required, 'urlend' (the second part of the link's URL) is optional. The idea is to build an URL of the form 'urlbegin' + id(e.g. protein name) + 'urlend', whereas 'urlend' will not always be necessary. The mentioned id will be defined within the 'node' element below.
<hyperlinkStubs>
<stub name="ENSEMBL" urlbegin="http://www.ensembl.org/Homo_sapiens/searchview?q="/>
<stub name="google" urlbegin="http://www.google.at/search?source=ig&hl=de&rlz=&q=" urlend="&btnG=Google-Suche&meta="/>
</hyperlinkStubs>
The optional element 'nodegroups' and its sub elements 'nodegroup' are used to pre-define clusters of nodes. Such groups(clusters) can be different KEGG pathways or sub-cellular location. The latter is a special on, since grouping by sub-cell location will lead to distributing nodes within various cell compartments in a cartoon cell. To take advantage of that, there have to be some strict naming conventions. For now, it is only important, that the attribute 'name' has to have the very value "Sub-Cell Location". All the other groups, like KEGG pathways, can be named freely.
<nodegroups>
<nodegroup name="Metabolism"/>
<nodegroup name="Sub-Cell Location"/>
</nodegroups>>
Within the required 'interaction' element one has to define the different types of edges. It is important, that the ids (required attribute 'ID') are numbered serially, starting with the number 1. An edge can also have the interaction id 0, this number is reserved for the summed-up edges, which will be displayed in the Applet at the beginning. If there are no edges of this summed-up type with the id 0, the initial screen will be empty and you will have to switch to "Detailed Links" in the "View" menu.
The 'interaction' element consists of four meta-type elements: the 'type' element (representing type interactions - see examples below), the 'species' element (representing species interactions - e.g. human), the 'predictedLinks' element (representing result interactions - e.g. results from the FunCoup program), the 'nonConfidence' element (representing types which do not contribute to the confidence score - e.g. Paralogs). Each of their sub-elements (called 'type', 'spec', 'link' and 'nonConf', respectively) consist of three required attributes: 'ID' (the serially numbered identifier), 'name' (the name of the interaction), 'color' (the color which represents this type in R,G,B notation) and the optional 'checked' attribute (indicats whether edges of this type are displayed initially - true is the default value).
<interaction>
<types>
<type ID="1" name="Sub-cellular co-localization" color="255,0,0"/>
<type ID="2" name="mRNA co-expression" color="0,0,255"/>
</types>
<species>
<spec ID="3" name="human" color="255,0,128"/>
</species>
<predictedLinks>
<link ID="4" name="Protein-potein interaction" color="0,255,0"/>
</predictedLinks>
<nonConfidence>
<nonConf ID="5" name="Paralogs" color="80,0,80" checked="false"/>
</nonConfidence>
</interaction>
The 'edges' element is required and consists of several 'edge' elements. An 'edge' element has five required attributes: 'n1' (the name of the start node), 'n2' (the name of the end node), 'iType' (the id of the interaction type), 'conf' (the edge's confidence value), 'ortn' (the orientation of the edge, 0.0 is straight, greater or smaller than that is curved).
<edges>
<edge n1="A" n2="B" iType="0" conf="0.54" ortn="0.0"/>
<edge n1="A" n2="B" iType="1" conf="0.3" ortn="1.0"/>
<edge n1="A" n2="B" iType="2" conf="0.4" ortn="-1.0"/>
<edge n1="B" n2="C" iType="0" conf="0.54" ortn="0.0"/>
<edge n1="B" n2="C" iType="1" conf="0.54" ortn="1.0"/>
<edge n1="A" n2="C" iType="0" conf="0.4" ortn="0.0"/>
<edge n1="A" n2="C" iType="4" conf="0.4" ortn="-0.5"/>
<edge n1="A" n2="C" iType="5" conf="0.9" ortn="-1.5"/>
<edge n1="A" n2="C" iType="3" conf="0.2" ortn="2.5"/>
</edges>
The required 'nodes' element consists of several 'node' elements and these can have many attributes and sub elements. Following attributes are available: 'name' (the name of the node), 'x' (the relative horizontal node position on the screen between 0.0 and 1.0), 'y' (the relative vertical node position on the screen between 0.0 and 1.0), 'color' (the node's color in R,G,B notation), 'shape' (the node's shape, 0 - 9 available), 'size' (indicating the size of the node, 0 - 9 available; 0 does not mean that the node is not visible, it is just the smallest possible node). All attributes except 'size' are required.
The following sub elements are optionally available: the element 'att' surrounds the annotation of a node (note, that some characters are not allowed within a XML file), the element 'groups' contains 'group' elements with the attributes 'ref' (the reference to one of the groups defined above as 'nodegroup') and 'name' (the name of the sub group this node belongs to). As mentioned above, the sub-cell grouping is a special feature and needs special naming conventions. Not only has the group to have the exact name "Sub-Cell Location" (the 'ref' attribute), also the sub groups have to be named in a strictly predefined way (the 'name' attribute). The following sub groups for sub-cell location grouping are available: "Mitochondrion", "ER", "Golgi", "Nucleus", "Cytoplasm", "Membrane" and "Extracellular".
The next element is the 'hlIDs' containing 'ID' elements. The 'name' attribute represents the id needed for the URL (discussed above), the 'ref' attribute defines the reference to one hyperlink 'stub' above.
<nodes>
<node name="A" x="0.12" y="0.71" color="0,0,170" shape="2" size="7">
<att>Ornithine carbamoyltransferase, mitochondrial precursor </att>
<groups>
<group ref="Metabolism" name="Arginine and proline metabolism"/>
<group ref="Sub-Cell Location" name="Mitochondrion"/>
</groups>
<hlIDs>
<ID name="ENSG00000036473" ref="ENSEMBL"/>
<ID name="ENSG00000036473" ref="google"/>
</hlIDs>
</node>
<node name="B" x="0.61" y="0.77" color="255,255,0" shape="3" size="7">
<att>Argininosuccinate synthase </att>
<groups>
<group ref="Metabolism" name="Alanine and aspartate metabolism"/>
<group ref="Sub-Cell Location" name="ER"/>
</groups>
<hlIDs>
<ID name="ENSG00000130707" ref="ENSEMBL"/>
</hlIDs>
</node>
<node name="C" x="0.94" y="0.54" color="255,255,0" shape="3" size="7">
<att>Inositol monophosphatase 2 </att>
<groups>
<group ref="Metabolism" name="Alanine and aspartate metabolism"/>
</groups>
<hlIDs>
<ID name="ENSG00000141401" ref="google"/>
</hlIDs>
</node>
</nodes>
The end of the file. Close the 'jsquid' tag.
</jsquid>
|