<tool id="reads2snp" name="Reads2SNP" version="1.8">
	<description>- Genotype prediction with paralog detection -</description>

	<requirements>
		<requirement type="binary">reads2snp</requirement>
	</requirements>
	
	<command interpreter="perl">

		reads2snp_wrapper.pl -alr $alr_file -out Reads2SNP_output -all Geno2pfas_output
		-mth $nb_thread
		-data Reads2SNP_species_data
		-mod $model 
		-sharing $parameters_sharing 
		-geno $output_genotype 
		-mat $multi_alleles 
		-min $min_coverage 
		-fis $fis 
		-gpv $min_post_probability
		
		-par $paraclean.paraclean_switch 
		-dir $paraclean.dirichlet 
		-thr $paraclean.threshold 
		-spa $paraclean.skip_paralogous
		-err $paraclean.error
		-pre $paraclean.precision
		-opt $paraclean.optimizer
		-kun $paraclean.mask
		      
		-pfc $pfasclean.pfasclean_switch
		-win $pfasclean.window_width
		-het $pfasclean.SNPdens_threshold
		-snp $pfasclean.H_threshold
		
		-exec galaxy -del yes > $Wrapper_log

	</command>
	
	<inputs>
		<param name="alr_file" type="data" label="Input multi .alr file" format="alr" help="Requested file type: ALR" />
		
		<param name="nb_thread" type="integer" label="Parallelization level - Number of threads to use" value="4" help="Field type: positive integer - Use 1 to disable multithreading" />
		
		<param name="model" type="select" label="Reads2SNP Model" help="Default value: M1">
			<!--<option value="M0">M0</option>-->
			<option value="M1" selected="true">M1</option>
			<option value="M2">M2</option>
		</param>
		
		<param name="parameters_sharing" type="select" label="Parameters sharing between species" help="Default value: Separate">
			<option value="all" >All</option>
			<option value="sep" selected="true">Separate</option>
			<option value="sh">Shared</option>
		</param>
		
		<param name="output_genotype" type="select" label="Genotype selection" help="Default value: Best">
			<option value="best" selected="true" >Best</option>
			<option value="rand" >Random</option>
		</param>
		
		<param name="multi_alleles" type="select" label="Multi alleles tolerance" help="Default value: Tolerated">
			<option value="acc" >Accepted</option>
			<option value="for" >Forbidden</option>
			<option value="tol" selected="true" >Tolerated</option>
		</param>
		
		<param name="min_coverage" type="integer" label="Minimum number of reads per individual needed for a prediction" value="10" help="Field type: positive integer" />
		
		<param name="fis" type="float" label="F statistic (fis)" value="0" help="Field type: float. Range of possible values [0..1]" />
		
		<param name="min_post_probability" type="float" label="Minimum posterior probability of a genotype required for not writing a N in the generated sequences" value="0.95" help="Field type: float" />
		
		<conditional name="paraclean">
			<param name="paraclean_switch" type="select" label="Use paraclean to detect paralogs ?" help="Choose yes to search for and replace paralogous position by N">
				<option value="true" selected="true">Yes</option>
				<option value="false">No</option> 
			</param>
			<when value="true"> 
				<param name="threshold" type="float" value="0.01" label="Ln Likelihood threshold" help="Threshold above which a position is considered paraloguous"/>
				<param name="dirichlet" type="select" label="Use dirichlet multinomial model ?" help="If set to yes then overdispersion will be taken into account">
					<option value="true" selected="true">Yes</option>
					<option value="false">No</option> 
				</param>
				<param name="skip_paralogous" type="select" label="Remove paralogous containing contigs ?" help="skip contigs with paralogous positions">
					<option value="true">Yes</option>
					<option value="false" selected="true">No</option> 
				</param>
				<param name="error" type="float" value="0.01" label="Maximum transition/tranvertion error" help="]0,0.3333[ "/>
				<param name="precision" type="float" value="0.001" label="Precision on parameters"/>
				<param name="optimizer" type="select" label="Optimizer" help="Algorithm used for optimization">
					<option value="newton" selected="true">PseudoNewtonOptimizer</option>
					<option value="bfgs">BFGS</option> 
				</param>
				<param name="mask" type="select" label="Mask position when optimizator fail">
					<option value="true" selected="true">Yes</option>
					<option value="false">No</option> 
				</param>
			</when>
			<when value="false" />
		</conditional>
		
		<conditional name="pfasclean">
			<param name="pfasclean_switch" type="select" label="Use pfasclean ?" help="mask heterozygosity and SNP density bias">
				<option value="true" selected="true">yes</option>
				<option value="false">no</option> 
			</param>
			<when value="true"> 
				<param name="window_width" type="integer" label="Width of the sliding window" value="4" help="Field type: positive integer" />
				<param name="SNPdens_threshold" type="float" label="SNP density threshold" value="0.333" help="Field type: float" />
				<param name="H_threshold" type="float" label="Heterozygosity threshold" value="0.6" help="Field type: float" />
			</when>
			<when value="false" />
		</conditional>
		
	</inputs>
	
	<outputs>
		<data name="Wrapper_log" label="Reads2SNP wrapper - Log file" format="txt" />
		
		<data name="Reads2SNP_output" from_work_dir="Reads2SNP_output" label="Genotype prediction" format="alr_gen" />
		<data name="Geno2pfas_output" from_work_dir="Geno2pfas_output" label="Alleles sequences" format="fasta" />
		<data name="Reads2SNP_species_data" from_work_dir="Reads2SNP_species_data" label="Summarized species data" format="csv" />
		<data name="Reads2SNP_error" from_work_dir="Reads2SNP_error" label="Reads2SNP wrapper - Error file" format="txt" />
	</outputs>
	
	<help>
**Reads2SNP Description**

-----

This program is a Read2SNP wrapper. It split a multi .alr file into simple .alr files and successively run Reads2SNP C++ executable on those created sub files.

All the individual result files are concatenated in a multi .alr_gen file which contains all the predicted genotypes (by position and individual) and the posterior probability of the each prediction.

-----

**ParaClean Description**

-----

ParaClean comes from the genofiltre.pl script written by Benoit Nabholz and Sylvain Glemin.

Paraclean analyze each position of an input sequence and try to define if a given heterozygous position is a real heterozygous postion or a paralogous one (which comes from the erroneous assembly of two paralogs into a single contig).

Real heterozygous status of a position is assessed through a likelihood ratio test between the two alternative hypothesis (paralogs vs heterozygotes).

Models (multinomial or Dirichlet multinomial) parameters are the counts for each bases at a position in every individuals.

The more the counts of two bases at a given position is far from 50/50 on different individuals (assuming there is no allelic expression bias), the more likely is the paralogous model.
	
----

**Pfasclean Description

This program can be used to clean (ie. mask heterozygosity and SNP density bias) the multiple alignment.

----	
	</help>
</tool>
