Maximum Parsimony on Networks
Can be used to compute the softwired and hardwired parsimony score of a phylogenetic network.
MPNet uses Integer Linear Programming (ILP). There are two versions. One version uses GLPK, which is free. The other version uses CPLEX, which is much faster than GLPK and free for academic use.
Mareike Fischer, Leo van Iersel, Steven Kelk and Celine Scornavacca. On Computing the Maximum Parsimony Score of a Phylogenetic Network. SIAM Journal on Discrete Mathematics, 29 (1): 559-585 (2015). (arXiv)
OPTION ONE: CPLEX VERSION
This is the recommended option.
Download the source file “MPNet.java” of the CPLEX version.
Install CPLEX, which is free for academic use, see IBM academic initiative.
Open a terminal window and navigate to the folder of MPNet.
Type “javac MPNet.java -classpath [path to cplex.jar]” to compile the source on your computer, where [path to cplex.jar] is the location of cplex.jar on your computer.
Type “java -Djava.library.path=[path to shared library] MPNet network.tree sequences.fasta” to run the program. Here, [path to shared library] should indicate the path to the shared library, usually the path to cplex.jar will work.
OPTION TWO: GLPK VERSION
This option is much slower.
Download the source file “MPNetGLPK.java” of the GLPK version.
Download GLPK. Under Linux, you put glpsol in the directory of MPNetGLPK. Under windows, you put glpsol.exe and all the dll-files from its directory into the directory of MPNetGLPK.
Open a terminal window and navigate to the folder of MPNetGLPK.
Type “javac MPNetGLPK.java” to compile the source.
Type “java MPNetGLPK network.tree sequences.fasta” to run the program.
- The file “network.tree” should contain a network in e-newick format, as in the example files.
- The file “sequences.fasta” should contain, on each line, a taxon name followed by a space and a character state, or sequence of character states, as in the example files. Symbols ‘-‘ and ‘?’ can be used to indicate gaps.
- The program will construct dot-files containing the networks with assigned character states.
- If Graphviz is installed, the program will also convert the dot-files to pdf-files.
Network with character-states that are optimal for the hardwired parsimony score. Labels on the edges indicate the number of changes on that edge. The hardwired parsimony score is 15.
Network with character-states that are optimal for the softwired parsimony score. Labels on the reticulation edges indicate the support for that edge, see below. The softwired parsimony score 12.
Computation takes 0.0 seconds.
- Add –nolabels to hide taxon labels in the constructed network.
- Add –nostates to hide character states in the constructed network.
- Add –softwired to only compute the softwired parsimony score, not the hardwired one.
- Add –hardwired to only compute the hardwired parsimony score, not the softwired one.
- Add –silent to not show any intermediate results.
- Add –rand k instead of a sequence-file, to randomly generate a single k-state character.
- Roughly speaking the edge-support values indicate the fraction of the genome that is inherited over that edge. More precisely, the values are calculated as follows:For a reticulation edge e=(u,r), define s(u) as the number of characters for which u is the only parent of r with the same character state as r. The support of edge e is now obtained by dividing s(u) by the sum of s(p)over all parents p of r. This makes sure that the support of all reticulation edges of a reticulation add up to 1, unless they are all 0. Moreover, the support of a reticulation-edge is 0 if and only if the edge can be removed from the network without increasing the parsimony score.
- To reproduce the experiments from the paper, use “java MPNet networks.txt –rand k –silent” for k=2,3,4 and with this input file (adding “-Djava.library.path=[path to shared library]” if necessary).