.. Mikado documentation master file, created by
   sphinx-quickstart on Mon Jul 18 14:33:33 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. image:: mikado-logo.png
    :align: center
    :scale: 75%

.. _Portcullis: https://github.com/maplesond/portcullis
.. _Minos: https://github.com/EI-CoreBioinformatics/minos
.. _Transdecoder: http://transdecoder.github.io/
.. |python_36_badge| image:: https://img.shields.io/badge/python-3.6-blue.svg
   :target: https://www.python.org/downloads/release/python-360/
.. |python_37_badge| image:: https://img.shields.io/badge/python-3.7-blue.svg
   :target: https://www.python.org/downloads/release/python-372/
.. |python_38_badge| image:: https://img.shields.io/badge/python-3.8-blue.svg
   :target: https://www.python.org/downloads/release/python-386/
.. |python_39_badge| image:: https://img.shields.io/badge/python-3.9-blue.svg
   :target: https://www.python.org/downloads/release/python-391/
.. |gh_action_status| image:: https://github.com/EI-CoreBioinformatics/mikado/workflows/Mikado/badge.svg
    :target: https://github.com/EI-CoreBioinformatics/mikado/actions?query=workflow%3A%22Mikado%22+branch%3Amaster
.. |coverage| image:: https://codecov.io/gh/EI-CoreBioinformatics/mikado/branch/master/graph/badge.svg
  :target: https://codecov.io/gh/lucventurini/mikado
.. |releases| image:: https://img.shields.io/github/release/EI-CoreBioinformatics/mikado.svg
  :target: https://github.com/EI-CoreBioinformatics/mikado/releases
.. |downloads| image:: https://img.shields.io/github/downloads/EI-CoreBioinformatics/mikado/total.svg?style=social&logo=github&label=download
  :target: https://github.com/EI-CoreBioinformatics/mikado/releases

.. |conda downloads| image:: https://anaconda.org/bioconda/mikado/badges/downloads.svg
  :target: https://anaconda.org/bioconda/mikado

############################
Mikado: pick your transcript
############################

|releases| |downloads| |conda downloads| |python_36_badge| |python_37_badge| |python_38_badge| |python_39_badge| |gh_action_status| |coverage|

:Authors:
    Venturini Luca,
    Yanes Luis,
    Caim Shabhonam,
    Mapleson Daniel,
    Kaithakottil Gemy George,
    Swarbreck David
:Version: |ProjectVersion| (|today|)

Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript,  UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by Portcullis_ [Portcullis]_ can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.

Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis_, by locating coding ORFs on the transcripts using Transdecoder_, and by obtaining homology information through BLASTX [Blastplus]_.

Our approach is amenable to include sequences generated by *de novo* Illumina assemblers or reads generated from long read technologies such as PacBio.

Our tool was presented at `Genome Science 2016 <http://genomescience.org.uk/>`_, both as a :download:`poster <assets/Mikado_GenomeScience2016_poster.pdf>` and in :download:`a talk during the Bioinformatics showcase session <assets/Mikado_GenomeScience2016.pdf>`.

Mikado was published in GigaScience in August 2018 [Mikado]_. We provide a :download:`PDF copy <assets/Mikado_paper_2018_giy093.pdf>` of the open access paper on this website, for reference.

Development is currently active, and Mikado is tightly integrated in an upcoming pipeline for genome annotation refinement, Minos_.


Mikado version 2: integrating multiple gene predictions
=======================================================

During the summer of 2019, we finished work on the new version of Mikado. The focus on the work was to make Mikado a software product
capable of integrating the results of multiple gene annotations, similarly to [PASA]_ or [Maker2]_. Contrary to Maker2, Mikado is not
in itself a full annotation pipeline; we are currently work on one such, which will use Mikado first to clean the transcripts assemblies,
and then to create a final gene annotation by comparing multiple *ab initio* annotations together with protein alignments and transcript assemblies or cDNA alignments.

Starting from this version, Mikado is therefore capable of considering arbitrary measures of transcript quality (e.g. transcript quantification or similarity of the
predicted ORF against a known protein database); moreover, it is capable of reconcile the structures of the transcripts present in a single locus. This allows to e.g.
add an inferred UTR for *ab initio* predictions using complementary RNAseq data. Through this mechanism, Mikado is also capable of reconstructing the correct ORF of transcripts
present only in fragmentary form - as long as there is at least another transcript in the locus that can provide the missing data.
This mechanism is similar to the one implemented in [PASA]_. Please see the :ref:`relevant section in Algorithms <padding>` for details.

Citing
~~~~~~

If you use Mikado in your work, please consider to cite:

    Venturini L., Caim S., Kaithakottil G., Mapleson D.L., Swarbreck D.
    Leveraging multiple transcriptome assembly methods for improved gene structure annotation.
    GigaScience, Volume 7, Issue 8, 1 August 2018, giy093, `doi:10.1093/gigascience/giy093 <https://doi.org/10.1093/gigascience/giy093>`_


If you also use Portcullis to provide reliable junctions to Mikado, either independently or as part of the Daijin pipeline, please consider to cite:

    Mapleson D.L., Venturini L., Kaithakottil G., Swarbreck D.
    Efficient and accurate detection of splice junctions from RNAseq with Portcullis.
    GigaScience, Volume 7, Issue 12, 12 December 2018, giy131, `doi:10.1093/gigascience/giy131 <https://doi.org/10.1093/gigascience/giy131>`_


Availability and License
~~~~~~~~~~~~~~~~~~~~~~~~

Open source code available on github: `https://github.com/EI-CoreBioinformatics/mikado <https://github.com/EI-CoreBioinformatics/mikado>`_

For Linux and OSX (the latter only since v2.2.3) we also provide installation through Conda: `https://anaconda.org/bioconda/mikado <https://anaconda.org/bioconda/mikado>`.

Please report any issue you might encounter to the `EI-CoreBioinformatics issue tracker <https://github.com/EI-CoreBioinformatics/mikado/issues>`_.

This documentation is hosted publicly on read the docs: `https://mikado.readthedocs.org/en/latest/ <https://mikado.readthedocs.org/en/latest/>`_

Mikado is available under `GNU LGLP V3 <http://www.gnu.org/licenses/lgpl.txt>`_.

Acknowledgements
~~~~~~~~~~~~~~~~

Mikado has greatly benefited from the public libraries, in particular [Cython]_, the [NetworkX]_ library, Scipy, Numpy and Pandas ([Scipy]_, [Numpy]_, [Pandas]_), BioPython [BioPython]_, Intervaltree [PYinterval]_, and the BX library for a Cython implementation of interval trees [BXPython]_. Moreover, Mikado makes liberal use of the PySAM [PySAM]_ library for analysing SAM/BAM files as well as for working with FASTA files. Mikado has also been constantly optimised using Snakeviz [Snakeviz]_, a tool which proved invaluable during the development process.


Credits
~~~~~~~

 - Luca Venturini (The software architect and developer)
 - Shabhonam Caim (Primary tester and analytic consultancy)
 - Daniel Mapleson (Developer of PortCullis and of the Daijin pipeline)
 - Luis Yanes (Software developer)
 - Gemy Kaithakottil (Tester and analytic consultancy)
 - David Swarbreck (Annotation guru and ideator of the pipeline)

Contents
--------

.. toctree::
   :maxdepth: 2
   :numbered:

   Introduction
   Installation
   Tutorial/index
   Tutorial/Daijin_tutorial
   Tutorial/Scoring_tutorial
   Tutorial/Adapting
   Usage/index
   Algorithms
   Scoring_files
   References
   Library/modules