_images/mikado-logo.png

Mikado: pick your transcript

python_badge snake_badge

Authors:Venturini Luca, Caim Shabhonam, Mapleson Daniel, Kaithakottil Gemy George, Swarbreck David
Version:1.0.1 (April 2017)

Mikado is a lightweight Python3 pipeline to identify the most useful or “best” set of transcripts from multiple transcript assemblies. Our approach leverages transcript assemblies generated by multiple methods to define expressed loci, assign a representative transcript and return a set of gene models that selects against transcripts that are chimeric, fragmented or with short or disrupted CDS. Loci are first defined based on overlap criteria and each transcript therein is scored based on up to 50 available metrics relating to ORF and cDNA size, relative position of the ORF within the transcript, UTR length and presence of multiple ORFs. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts. Optionally, junction confidence data as provided by Portcullis [Portcullis] can be used to improve the assessment. The best-scoring transcripts are selected as the primary transcripts of their respective gene loci; additionally, Mikado can bring back other valid splice variants that are compatible with the primary isoform.

Mikado uses GTF or GFF files as mandatory input. Non-mandatory but highly recommended input data can be generated by obtaining a set of reliable splicing junctions with Portcullis, by locating coding ORFs on the transcripts using Transdecoder, and by obtaining homology information through BLASTX [Blastplus].

Our approach is amenable to include sequences generated by de novo Illumina assemblers or reads generated from long read technologies such as PacBio.

Our tool was presented at Genome Science 2016, both as a poster and in a talk during the Bioinformatics showcase session.

Citing

We are currently working on our paper, and we will be releasing a pre-print shortly. In the meantime, if you use Mikado please reference our github page: https://github.com/lucventurini/mikado

Availability and License

Open source code available on github: https://github.com/lucventurini/mikado

This documentation is hosted publicly on read the docs: https://mikado.readthedocs.org/en/latest/

Mikado is available under GNU LGLP V3.

Acknowledgements

Mikado has greatly benefitted from the public libraries, in particular the [NetworkX] library, Scipy and Numpy ([Scipy], [Numpy]), Intervaltree [PYinterval], and the BX library for a Cython implementation of interval trees. Mikado has also been constantly optimised using Snakeviz, a tool which proved invaluable during the development process.