-----------------------------------------------------------------------------
ProtTest 3 Readme -	25/Oct/2015

Diego Darriba (ddarriba@udc.es)
Guillermo L. Taboada (taboada@udc.es)
Ramón Doallo (doallo@udc.es)
David Posada (dposada@uvigo.es)

(c) Universidad de Vigo, Vigo, Spain
(c) Grupo de Arquitectura de Computadores, UDC, Spain

Citation:

    Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of
    best-fit models of protein evolution. Bioinformatics, 27:1164-1165, 2011 

  In addition, given that ProtTest uses Phyml intensively, we encourage users to 
  cite this program as well when using ProtTest:

    Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithm to 
    estimate large phylogenies by maximum likelihood. Syst Biol. 52: 696-704. 
    [Phyml]

-----------------------------------------------------------------------------

TOC

0. Quick Start
1. Introduction
2. Installation
3. Structure
4. How to run
  4.1. Required conditions
  4.2. Console execution
    4.2.1. Sequential execution
    4.2.2. Multithread execution
    4.2.3. Cluster execution
    4.2.4. Cluster hybrid execution
  4.3. Graphical user interface execution
5. Frequently Asked Questions

-----------------------------------------------------------------------------
0. Quick Start
-----------------------------------------------------------------------------

Console version

$ tar zvxf prottest-3.4.1-yyyymmdd.tar.gz
(Linux/Unix/OS X)
$ export PROTTEST_HOME=$PWD/prottest3
(Windows)
$ set PROTTEST_HOME=$CWD\prottest3
$ cd $PROTTEST_HOME
$ java -jar prottest-3.4.1.jar -i examples/COX2_PF0016/alignment -all-matrices -all-distributions -threads 2

Graphical version

$ tar zvxf prottest-3.4.1-yyyymmdd.tar.gz
(Linux/Unix/OS X)
$ export PROTTEST_HOME=$PWD/prottest3
(Windows)
$ set PROTTEST_HOME=$CWD\prottest3
$ cd $PROTTEST_HOME
$ ./runXProtTest.sh

-----------------------------------------------------------------------------
1. Introduction
-----------------------------------------------------------------------------

ProtTest3 is a High Performance Computing implementation of ProtTest, for
selection of best-fit models of protein evolution. It is currently under
development so it could have some errors and the documentation is pending.

Please, report bugs and enquiries about ProtTest 3 to: ddarriba@udc.es,
dposada@uvigo.es.

The main page of the ProtTest project is located at
https://bitbucket.org/diegodl/prottest3

-----------------------------------------------------------------------------
2. Installation
-----------------------------------------------------------------------------

All required software is included in the distributions. For further information, 
please refer to the 'How to run' section.

-----------------------------------------------------------------------------
3. Structure
-----------------------------------------------------------------------------

Inside the ProtTest 3 directory you should find the following structure:

$PROTTEST_HOME/
  |
  |
  |--bin/
     # auxiliary binaries to calculate models likelihood
  |--examples/
     # some test alignments and trees
  |
  | filecluster8.conf.template  # machines file template for 8 node cluster
  | machines			# machines file for cluster execution
  | prottest.properties		# ProtTest3 configuration file (always required)
  | prottest-3.4.1.jar		# ProtTest3 package
  | README			# this README file
  | COPYING                     # ProtTest3 licese
  | runProtTestHPC.sh		# basic script for ProtTest3 cluster execution
  | runXProtTestHPC.sh		# basic script for ProtTest3 graphical version
  
- The ProtTest properties (prottest.properties) file must be located in the 
ProtTest root directory. It has the execution preferences. Please refer to such
file to know more about the execution properties.

- The external applications should be located on "bin" directory under the 
ProtTest root, where at least the one which matches your computer architecture
must exist.

-----------------------------------------------------------------------------
4. How to run
-----------------------------------------------------------------------------

There are several ways to run ProtTest according to the resources you want
to use.

It's preferred to use JRE 1.6 to run ProtTest, in fact, the graphical user
interface will only work with JRE 1.6 or newer version.

You can download the newest version at http://java.sun.com/javase/

-----------------------------------------------------------------------------
	4.1. Required conditions
-----------------------------------------------------------------------------

In order to run ProtTest there will be some general constraints you have to
satisfy:

- You should run it from the ProtTest root directory (i.e. the directory where
you have unpackaged the distribution).

-----------------------------------------------------------------------------
	4.2. Console execution
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
	4.2.1 Sequential execution
-----------------------------------------------------------------------------

Under command line you can use several parallel executions of ProtTest, but first
it is important to understand how to run the sequential execution.

These are the parameters of the sequential execution:

     -i alignment_filename
            Alignment input file (required)
     -t tree_filename
            Tree file       (optional) [default: NJ tree]
     -o output_filename
            Output file     (optional) [default: standard output]
     -log enabled/disabled
            Enables / Disables PhyML logging into log directory (see prottest.properties)
     -[matrix]
            Include matrix (Amino-acid) = JTT LG DCMut MtREV MtMam MtArt Dayhoff WAG 
	                                  RtREV CpREV Blosum62 VT HIVb HIVw FLU 
                If you don't specify any matrix, all matrices displayed above will
                be included.
     -I
            Include models with a proportion of invariable sites
     -G
            Include models with rate variation among sites and number of categories
     -IG
            include models with both +I and +G properties
     -all-distributions
            Include models with rate variation among sites, number of categories and both
     -ncat number_of_categories
            Define number of categories for +G and +I+G models [default: 4]
     -F
            Include models with empirical frequency estimation 
     -AIC
            Display models sorted by Akaike Information Criterion (AIC)
     -BIC
            Display models sorted by Bayesian Information Criterion (BIC)
     -AICC
            Display models sorted by Corrected Akaike Information Criterion (AICc)
     -DT
            Display models sorted by Decision Theory Criterion
     -all
            Displays a 7-framework comparison table
     -S optimization_strategy
            Optimization strategy mode: [default: 0]
             		0: Fixed BIONJ JTT
             		1: BIONJ Tree
             		2: Maximum Likelihood tree
             		3: User defined topology
     -s moves
            Tree search operation for ML search: 
            NNI (fastest), SPR (slowest), BEST (best of NNI and SPR) [default: NNI]
     -t1      				
            Display best-model's newick tree [default: false]
     -t2      				
            Display best-model's ASCII tree  [default: false]
     -tc consensus_threshold 
            Display consensus tree with the specified threshold, between 0.5 and 1.0
            [0.5 = majority rule consensus ; 1.0 = strict consensus]
     -threads number_or_threads			
            Number of threads requested to compute (only if MPJ is not used) [default: 1]
     -verbose
            Verbose mode [default: false]


Basic execution example:

	$ cd $PROTTEST_HOME
	$ java -jar prottest-3.4.1.jar -i examples/COX2_PF0016/alignment -all-distributions -F -AIC -BIC -tc 0.5


-----------------------------------------------------------------------------
	4.2.2 Multithread execution
-----------------------------------------------------------------------------

Just like the sequential execution, but adding the number of threads option. You 
can use up to the number of cores in your machine and beyond. Following the basic
execution example, a recommended multithread execution in a quad-core machine will be:

	$ cd $PROTTEST_HOME
	$ java -jar prottest-3.4.1.jar -i examples/COX2_PF0016/alignment -all-distributions -F -AIC -BIC -tc 0.5 -threads 4

It is remarkable that the substitution models will not be computed in a specific
order, so the full output will be shown once all models have been computed. 
Meanwhile, the main thread will notify each executed model by the standard output.

-----------------------------------------------------------------------------
	4.2.3 Cluster execution
-----------------------------------------------------------------------------

1. Besides the multithreading support, it is possible to run ProtTest in 
a cluster. This support has been implemented using a Java message-passing 
(MPJ) library, MPJ Express (http://mpj-express.org/). To execute ProtTest 
in a cluster environment you have to:

$ cd $PROTTEST_HOME
$ export MPJ_HOME=$PROTTEST_HOME/mpj
$ export PATH=$MPJ_HOME/bin:$PATH

You can also add the last two lines to ~/.bashrc to automatically set this
variables at console startup.

2. $PROTTEST_HOME/machines file contains the set of compute nodes amongst the
mpj processes will be executed. By default it points to the localhost machine,
so you should change it if you want to run parallel exeecution over a cluster
machine, just writing on each line the compute nodes (e.g. see 
filecluster8.conf.template).

3. Start the MPJ Express daemons:

$ mpjboot machines

The application "mpjboot" should be in the execution path (it is located at
$MPJ_HOME/bin). A ssh service must be running in the machines listed in the
machines file. Moreover, port 10000 should be free. For more details refer to
the MPJ Express documentation.

4. Run ProtTest. For the execution the ProtTest distribution provides a
bash script: 'runProtTestHPC.sh'

The basic syntax is:
	./runProtTestHPC.sh $NUMBER_OF_PROCESSORS $APPLICATION_PARAMETERS

$ ./runProtTestHPC.sh 2 -i examples/COX2_PF0016/alignment -all-distributions -F -AIC -BIC -tc 0.5

-----------------------------------------------------------------------------
	4.2.4 Cluster hybrid execution
-----------------------------------------------------------------------------

This strategy relies on a thread-based implementation of phyml together with the
distributed memory of ProtTest version. Please request us the thread-based
phyml source code and further documentation.

-----------------------------------------------------------------------------
	4.3. Graphical user interface execution
-----------------------------------------------------------------------------

If you are running ProtTest on your desktop computer, you'd better use the
graphical user interface, provided by the XProtTest package. It will be
show results clearer to the user and will be easier to manipulate them too.

The graphical interface can use a sequential or a shared memory execution
of the application, so it will be the best choice for running ProtTest on
a local multicore machine.

$ cd $PROTTEST_HOME
$ ./runXProtTestHPC.sh

-----------------------------------------------------------------------------
5. Frequently Asked Questions
-----------------------------------------------------------------------------

 5.1. I got this error when running ProtTest 3

      ./phyml-prottest-linux: /lib/libc.so.6: version `GLIBC_2.7' not found 

    ---

    You can try to update GLIB to the required version (2.7). However, this is
    not always possible, for example when running ProtTest on a cluster. If
    you have an old version of PhyML you can try to use it just linking the
    binary file to $PROTTEST_HOME/bin/phyml

    If this does not work, it is possible that your "old" version of PhyML 
    does not support the --no-memory-check argument, included in lattest PhyML
    versions. In this case, edit the prottest.properties file and change the
    "no-memory-check" property to "no".
