Documents, publications and ontologies

ACMG Standards and Guidelines for the Interpretation of Sequence Variants

A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
Genet Med. 2015 May; 17(5): 405–424.

Best practice guidelines

Copyright © ACGS /VGKL 2013 Practice Guidelines for the Evaluation of Pathogenicity and the Reporting of Sequence Variants in Clinical Molecular Genetics.

HPO Human Phenotype Ontology

An ontology is a computational representation of a domain of knowledgebased upon a controlled, standardized vocabulary for describing entities and the semantic relationships between them. The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Terms in the HPO describes a phenotypic abnormality, such as atrial septal defect.
The HPO was initially developed using information from Online Mendelian Inheritance in Man (OMIM), which is a hugely important data resource in the field of human genetics and beyond. The HPO is currently being developed using information from OMIM and the medical literature and contains approximately 10,000 terms. Over 50,000 annotations to hereditary diseases are available for download or can be browsed using the PhenExplorer.

Bioinformatic tool and resource analysis

Clinical scientists assessing these genetic variations routinely use a number of different resources to interpret the mutations and their influence on disease. A number of in silico tools have been developed for assessing the impact of variants on the structure and function of the encoded protein as well as correct splicing. NGRL Manchester aims to provide information about the tools available and guidance on their appropriate use in clinical diagnostics.

Splice prediction tools

Splice Site Tools, a Comparative analysis report

NGRL, Beth Hellen, 2009

Alternative Splice Site Predictor (ASSP)

ASSP predicts putative alternative exon isoform, cryptic, and constitutive splice sites of internal (coding) exons. Skipped splice sites are not differentiated from constitutive sites. Non-canonical splice sites are not detected. Alternative splicing is predicted based on the DNA/RNA sequence information only. For splice site prediction within a sequence putative splice sites are preprocessed using position specific score matrices.

ESEfinder: Exonic Splicing Enhancers finder

Analyzes the exonic sequence to find the presence of Exonic Splicing Enhancer Elements.


EX-SKIP is simple utility that compares the ESE/ESS profile of a wild-type and a mutated allele to quickly determine which exonic variant has the highest chance to skip this exon. It calculates the total number of ESSs, ESEs and their ratio. Specifically, it computes the number of RESCUE-ESEs (Fairbrother 2004; Fairbrother 2002), FAS-ESSs (Wang 2004), PESEs/PESSs (Zhang 2004), neighbourhood inference (Stadler 2006) and EIE/IIEs (Zhang 2008) for each segment.

Fruit Fly Splice Predictor

Fruit fly and human splice predictor.


A fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been trained and tested successfully on Plasmodium falciparum (malaria), Arabidopsis thaliana, human, Drosophila, and rice . Training data sets for human and Arabidopsis thaliana are included. See below for instructions on downloading the complete system including source code.

MaxEntScan: predicting splice sites using 'Maximum Entropy Principle'

MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions. This method is based on the 'Maximum Entropy Principle' and generalizes most previous probabilistic models of sequence motifs such as weight matrix models and inhomogeneous Markov models.

MIT splice predictor

GENSCAN Web Server at MIT: identification of complete gene structures in genomic DNA.

RESCUE-ESE: online tool for identifying candidate ESEs in vertebrate exons

Specific short oligonucleotide sequences that enhance pre-mRNA splicing when present in exons, termed exonic splicing enhancers (ESEs), play important roles in constitutive and alternative splicing (ESE References). A hybrid computational/experimental method, RESCUE-ESE, was recently developed for identifying sequences with ESE activity. In this approach, specific hexanucleotide sequences are identified as candidate ESEs on the basis that they have both significantly higher frequency of occurrence in exons than in introns and also significantly higher frequency in exons with weak (non-consensus) splice sites than in exons with strong (consensus) splice sites. Representative hexamers from ten different classes of candidate ESEs, together with 6 or 7 bases of flanking sequence context on each side, were introduced into a weak (poorly spliced) exon in a splicing reporter construct. These reporter minigenes were then transfected into cultured cells, where they are transcribed and spliced, and the relative level of inclusion of the test exon was assayed by quantitative (radio-labeled) RT-PCR. Point mutants of these sequences were also analyzed to confirm the precise motifs responsible for ESE activity.

Splice Predictor (DK)

Splice Predictor (DK)

SplicePort: An Interactive Splice Site Analysis Tool

SplicePort is a web-based tool for splice-site analysis that allows the user to make splice-site predictions for submitted sequences. In addition, the user can also browse the rich catalog of features that underlies these predictions, and which we (the authors) have found capable of providing high classification accuracy on human splice sites. Feature selection is optimized for human splice sites, but the selected features are likely to be predictive for other mammals as well.

Various prediction tools


The use of bioinformatics is ubiquitous within the life sciences. In, we are striving to provide a comprehensive registry of software and databases, facilitating researchers from across the spectrum of biological and biomedical science to find, understand, utilise and cite the resources they need in their day-to-day work.
Everything from simple command-line tools and online services, through to databases and complex, multi-functional analysis workflows is included. Resources are described in a rigorous semantics and syntax, providing end-users with the convenience of concise, consistent and therefore comparable information.

Various tools, some specific for HIV

Analysis and Quality Control,Phylogenetics, Alignment and sequence manipulation, Immunology, Database search interfaces, Format and display,...


Align-GVGD is a freely available, web-based program that combines the biophysical characteristics of amino acids and protein multiple sequence alignments to predict where missense substitutions in genes of interest fall in a spectrum from enriched delterious to enriched neutral. Align-GVGD is an extension of the original Grantham difference to multiple sequence alignments and true simultaneous multiple comparisons.

BLAT Search Genome

BLAT quickly maps your sequence to the genome.
Directly callable from our sequencing software Gensearch.

Direct haplotyping using trace files with indels

This program genotypes a tracefile that results from sequencing of two chromosomes, which one of them having a deletion preceding a SNP. In such case the phase of these markers can be determined.
Shift Detector
This program checks for the possibility that a tracefile results from sequencing of two similar sequences, which one of them having a deletion of 1 to 25 bases. Such deletion would produce a superimposed tracefile following the site of deletion


Online tool developped by Charles University, Prague, to analyse MLPA data. Please ignore the certificate error message. The eMLPA web system performs processing and computational analysis of MLPA data from genetic analyzers (raw data).
eMLPA is a universal user interface for various types of MLPA kits. It offers variant methods of eMLPA data normalization and analysis. No local program installation is needed for analyses – the only pre-requisite is access to the Internet.

Genetic Variant Interpretation Tool

To aid our variant interpretation process, we created an openly-available online tool to efficiently classify variants based on the evidence categories outlined in the article: Richards, et al. Standards and guidelines for the interpretation of sequence variants. 2015. This site displays the evidence categories and descriptions from Table 3 and Table 4 with simple checkboxes for selecting appropriate criteria. The site then incorporates the algorithm in Table 5 to automatically assign the pathogenicity or benign impact based on the selected evidence categories.


Hansa is a tool to predict the deleterious effects of a mutation by using 10 Neutral- Disease Mis-Sense Mutation Discriminatory (NDMSMD) features. This tool will classify the mutation either as “DISEASE” or “NEUTRAL”

In Silico PCR

In-Silico PCR searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance.

Mutalyzer HGVS nomenclature checker

The aim of this program suite is to support checks of sequence variant nomenclature according to the guidelines of the Human Genome Variation Society.


MutationTaster evaluates disease-causing potential of sequence alterations.


Estimates the likelihood of a particular nonsynonymous (amino-acid changing) coding SNP to cause a functional impact on the protein. It calculates the subPSEC (substitution position-specific evolutionary conservation) score based on an alignment of evolutionarily related proteins, as described in Thomas et al., 2003 and Thomas & Kejariwal, 2004.

PMUT – Pathogenic mutation prediction

Pmut is a software aimed at the annotation and prediction of pathological mutations, and in particular at answering the following question: given a mutation happening at a specific location in a protein sequence, can we say whether it will be pathological (that is, a mutation that can lead to disease for the carrier) or non-pathological/neutral (no effect on the carrier's health)? Pmut is based on the use of different kinds of sequence information to label mutations, and neural networks to process this information (Ferrer-Costa et al., 2004). It provides a very simple output: a yes/no answer and a reliability index


PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Please, use the form below to submit your query.

Predict Protein

PredictProtein integrates feature prediction for secondary structure, solvent accessibility, transmembrane helices, globular regions, coiled-coil regions, structural switch regions, B-values, disorder regions, intra-residue contacts, protein-protein and protein-DNA binding sites, sub-cellular localization, domain boundaries, beta-barrels, cysteine bonds, metal binding sites and disulphide bridges.


SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations.


SNPs3D is a website which assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis.


Stitziel N O, Binkowski T A, Tseng Y Y, Kasif S, Liang J. 2004. topoSNP: a topographic database of non‐synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Research. 32(suppl_1), D520-D522.

VariantValidator nomenclature checker

We validate HGVS sequence variation descriptions, accurately mapping between transcript and genomic variants. We also automate conversion of genomic (VCF) sequence variation descriptions into the HGVS format and vice-versa.
VariantValidator auto-corrects your mistakes if it can and helps you correct your own if it can't. We provide a range of tools to meet your needs including batch processing, a VCF file converter and API access.

Bioinformatics tools for HIV

Geno2pheno [coreceptor] 2.5

On submitting a sequence containing the V3 region of the HIV-1 envelope protein gp120 below, you will obtain a sequence alignment to the consensus sequence given in Pfeifer and Lengauer and a prediction whether the corresponding virus is capable of using CXCR4 as a coreceptor (R5/X4 or X4 variants) or not (R5 variants).


Allows to identify strains of HIV with BLAST.
Directly callable from our HIV sequencing software GensearchHIV.


WebPSSM is a bioinformatic tool for predicting HIV-1 coreceptor usage from the amino acid sequence of the third variable loop (V3) of the envelope gene. Directly callable from our HIV sequencing software GensearchHIV.

Locus specific database software (LSDB)

Cafe Variome

Health Data Discovery
Café Variome is a flexible web-based, data discovery tool that can be quickly installed by any biomedical data owner to enable the “existence” rather than the “substance” of the data to be discovered.

Leiden Open Variation Database

LOVD is the software powering the largest network of curated gene variant databases in the world. Directly callable from our software Gensearch.


It was developed as a generic software to create locus-specific databases (LSDBs) with the 4th Dimension® package from 4D. The UMD software includes an optimized structure to assist and secure data entry and to allow the input of a wide range of clinical data. In addition various analyzing tools have been specifically designed to assist clinicians (phenotype-genotype correlations...), geneticists (distribution and frequency of mutations...) and research biologists (structural domains, molecular epidemiology...).

NGS tools


The Exomiser is a Java program that functionally annotates variants from whole-exome sequencing data in VCF 4 format. The functional annotation is performed with Jannovar and uses UCSC KnownGene transcript definitions and hg19 genomic coordinates.


This application performs phenotype-driven prioritization of candidate diseases and genes in the setting of genomic diagnostics (exome or genome) in which the phenotypic abnormalities are described as Human Phenotype Ontology (HPO) terms.


FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.