.. Mikado documentation master file, created by
   sphinx-quickstart on Mon Jul 18 14:33:33 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. image:: mikado-logo.png
    :align: center
    :scale: 75%

.. _Portcullis: https://github.com/maplesond/portcullis
.. _Transdecoder: http://transdecoder.github.io/
.. |python_badge| image:: https://img.shields.io/pypi/pyversions/snakemake.svg?style=flat-square
   :target: https://www.python.org/
.. |snake_badge| image:: https://img.shields.io/badge/snakemake-≥3.5.2-brightgreen.svg?style=flat-square)]
   :target: https://bitbucket.org/snakemake/snakemake/wiki/Home

============================
Mikado: pick your transcript
============================

|python_badge| |snake_badge|

:Authors:
    Venturini Luca,
    Caim Shabhonam,
    Mapleson Daniel,
    Kaithakottil Gemy George,
    Swarbreck David
:Version: 1.0.1 (April 2017)

Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript,  UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by Portcullis_ [Portcullis]_ can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.

Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis_, by locating coding ORFs on the transcripts using Transdecoder_, and by obtaining homology information through BLASTX [Blastplus]_.

Our approach is amenable to include sequences generated by *de novo* Illumina assemblers or reads generated from long read technologies such as PacBio.

Our tool was presented at `Genome Science 2016 <http://genomescience.org.uk/>`_, both as a :download:`poster <assets/Mikado_GenomeScience2016_poster.pdf>` and in :download:`a talk during the Bioinformatics showcase session <assets/Mikado_GenomeScience2016.pdf>`.

Citing
~~~~~~

We are currently working on our paper, and we will be releasing a pre-print shortly.
In the meantime, if you use Mikado please reference our github page: `https://github.com/lucventurini/mikado <https://github.com/lucventurini/mikado>`_


Availability and License
~~~~~~~~~~~~~~~~~~~~~~~~

Open source code available on github: `https://github.com/lucventurini/mikado <https://github.com/lucventurini/mikado>`_

This documentation is hosted publicly on read the docs: `https://mikado.readthedocs.org/en/latest/ <https://mikado.readthedocs.org/en/latest/>`_

Mikado is available under `GNU LGLP V3 <http://www.gnu.org/licenses/lgpl.txt>`_.

Acknowledgements
~~~~~~~~~~~~~~~~

Mikado has greatly benefitted from the public libraries, in particular the [NetworkX]_ library, Scipy and Numpy ([Scipy]_, [Numpy]_), Intervaltree [PYinterval]_, and the BX library for a Cython implementation of interval trees. Mikado has also been constantly optimised using Snakeviz, a tool which proved invaluable during the development process.


Credits
~~~~~~~

 - Luca Venturini (The software architect and developer)
 - Shabhonam Caim (Primary tester and analytic consultancy)
 - Daniel Mapleson (Developer of PortCullis and of the Daijin pipeline)
 - Gemy Kaithakottil (Tester and analytic consultancy)
 - David Swarbreck (Annotation guru and ideator of the pipeline)

Contents
--------

.. toctree::
   :maxdepth: 2
   :numbered:

   Introduction
   Installation
   Tutorial/index
   Tutorial/Daijin_tutorial
   Usage/index
   Algorithms
   References
   Library/modules