Phenopolis - An Open Platform for Harmonization & Analysis of Sequencing & Phenotype Data.

by Ginger Tsueng

One of our pride points for being able to pool, standardize, and share gene, variant, and other “BioThings” annotation data as a service, is that our service is fast! The reason that and are made with speed in mind is that we want them to be useful to bioinformaticians and tool/resource developers alike! How can we tell if we’ve successfully provided a useful service?

One measure we LOVE, is when a user builds something useful or amazing with our service--especially when the product is an openly available resource / tool such as Phenopolis. The team behind Phenopolis is a multidisciplinary group spanning multiple institutions with a keen interest in improving rare disease research and treatment. So keen was their interest that in just a year, the phenopolis team made significant progress in developing an open-source research and diagnosis tool even though the project lacked dedicated funding. Dr Nikolas Pontikos and other collaborators involved in Phenopolis were kind enough to answer our questions on this phenomenal resource.

In one tweet or less, introduce us to Phenopolis:

Phenopolis is a searchable database of exomes from 6,000 rare and common disease patients with detailed clinical features encoded using the Human Phenotype Ontology.

What was the original intent behind Phenopolis (how did Phenopolis come about, how was the collaborative effort started)?

Phenopolis is a multi-institutional effort that emerged from the work of Dr Nikolas Pontikos from UCL and Dr Jing Yu from University of Oxford when working on the UK Inherited Retinal Disease consortium (UCL, Oxford, Leeds, Manchester) funded by the Fight for Sight and Retina UK charity and the NIHR Moorfields BRC in 2015. Ismail Moghul from UCL was interested in the project and joined in 2016 to improve the design and the interface of Phenopolis. We wanted to build a tool similar to the ExAC browser in order for researchers and clinicians to be able to browse the wealth of data we had accumulated over the last 5 years, in an easy and intuitive way in order to facilitate identification of the likely causal mutations responsible for a patient’s phenotype, but also to uncover new gene to phenotype relationships.

Phenopolis was published in Bioinformatics in 2017. Sajid Mughal and Dr Pontikos suggested a novel database design known as a graph database to integrate both genetic and phenotypic data. This work led to the creation of the Pheno4J database published in Bioinformatics in 2017. In August 2018, Phenopolis became a UK limited company in order to secure further funding from the UK Government and support further development.

You mentioned that Phenopolis was developed in order to browse data you had accumulated. Was that data accumulated as part of a project? If it was accumulated as part of a project, can you provide a little detail about that project’s aims and the funders who supported it?

The UCL exomes initiative was started in 2013 by Dr Vincent Plagnol at UCL in an effort to build a robust bioinformatics pipeline to identify disease-causing mutations in rare and complex disease patient cohorts from research groups at UCL, Cambridge, QMUL and Oxford. The disease groups broadly cover, inflammatory bowel disorder, cardiology, neurology, rare bone marrow failures, mitochondrial disorders and rare eye disorders.

The original project was funded by a number of NIHR grants. In 2015, Dr Nikolas Pontikos joined UCL and took over the development of the pipeline. While providing the bioinformatics support for the UK Inherited Retinal disease consortium, he also started collecting Human Phenotype Ontology terms for the patient cohorts and started building the Phenopolis web platform along with Dr Jing Yu, a collaborator from Oxford. Ismail Moghul joined the team shortly after and greatly improved the web interface. Nikolas and Ismail are now co-founders of the Phenopolis company and are in the process of seeking seed-funding from UKRI.

How has Phenopolis since improved (key improvements, not just GitHub commits)?

Phenopolis code has gone over many iterations and we have tried different database technologies, from Mongo to graph databases, we have finally gone back to the tried and tested SQL as it offers the cleanest and easiest interface, and makes it particularly easy to import new data.

Another major improvement in terms of design was to separate the frontend from the backend as two separate apps. We built a separate web interface using HUGO framework which is hosted on AWS. The backend is a REST API which is behind an institutional firewall.

We are now also building multilingual support for Phenopolis adding Chinese, Japanese, Greek, German and French interfaces.

Who is currently the intended audience for Phenopolis?
Phenopolis is designed for researchers and clinicians who have an understanding of genetics but may not have the programming expertise to query their data.

How does Phenopolis use or services?

When we first developed Phenopolis, we offered a live API that would fetch additional annotation from and However, we now generate our results offline using the Variant Effect Predictor and store them in our database. In the future, we may complete the information from VEP with additional information from and

What are some of Phenopolis’s successes (news releases, papers published)?

Phenopolis was published in bioinformatics
We have 9 citations so far.
We have teamed up ( to run our genomics pipelines in the cloud.

What improvements are planned for Phenopolis?

Our goals are to obtain funding either from grants or from customers in order to hire one or two programmers to work on Phenopolis part or full-time. We are considering moving from python to Ruby on Rails to facilitate deployment.
We wish to integrate more data sources and analyses and make the server more scalable. We also plan to extend our capacity to do whole genome analysis.

Additional information/links:

We are preparing our manuscript led by Dr Jing Yu from Oxford on Phenogenon a novel method of analysis HPO terms and rare genetic variants.

We have also developed a graph database.