We're proud to announce the next version of MyGene.info! This 3rd version brings new features, fixes some issues, and can be reached using URL http://mygene.info/v3
. MyGene.info v2 will remain up and active while transitioning to v3. Stay tuned, we'll also post a step-by-step guide to migrate from v2 to v3.
Here's a brief list of changes, we'll discuss some of them in depth in next posts:
As some of you requested this feature (see We need your opinion), we now store accession number with version. You can search this information with and without the version, so the following requests will give the same results:
.4
)refseq.rna
key with version)refseq.rna
key, no version)...
"refseq": {
"genomic": [
"NC_000012.12",
"NC_018923.2",
"NG_034014.1"
],
"protein": [
"NP_001277159.1",
"NP_001789.2",
"NP_439892.2",
"XP_011536034.1"
],
"rna": [
"NM_001290230.1",
"NM_001798.4",
"NM_052827.3",
"XM_011537732.1"
],
...
Note: v2 doesn't store version, see http://mygene.info/v2/query?q=refseq.rna:NM_001798&fields=refseq
"refseq
", "accession
" and "ensembl
" now contains association between RNA and its protein product, within an added inner key "translation
", as show in the following example for gene ID 1017.
Note: if a RNA or protein accession number isn't available in the association, then it's not added to this list
http://mygene.info/v3/gene/1017?fields=refseq
{
"_id": "1017",
"refseq": {
...
"translation": [
{
"protein": "XP_011536034.1",
"rna": "XM_011537732.1"
},
{
"protein": "NP_001789.2",
"rna": "NM_001798.4"
},
{
"protein": "NP_439892.2",
"rna": "NM_052827.3"
},
{
"protein": "NP_001277159.1",
"rna": "NM_001290230.1"
}
]
}
}
_Note: v2 does provide this information, see http://mygene.info/v2/gene/1017?fields=refseq
exons
" inner structureInner structure is now a list of dictionary. Each dictionary contains information about the exons with a "transcript
" key containing the accession number. "position
" inner key contains the different exons' positions.
http://mygene.info/v3/gene/1017?fields=exons
{
"_id": "1017",
"_score": 21.731894,
"exons": [
{
"cdsend": 55971625,
"cdsstart": 55967008,
"chr": "12",
"position": [
[
55966768,
55967124
],
[
55968048,
55968169
],
[
55968777,
55968948
],
[
55971043,
55971247
],
[
55971520,
55972789
]
],
"strand": 1,
"transcript": "NM_001290230",
"txend": 55972789,
"txstart": 55966768
},
...
}
Note: you can compare this structure with the actual v2, which uses a dictionary instead of a list of dictionary: http://mygene.info/v2/gene/1017?fields=exons
There are some annoying cases of one-to-many matches between Ensembl IDs and Entrez IDs, based on the mapping from Ensembl. For example, Ensembl gene ID ENSMUSG00000071350 associated to Entrez gene IDs 628705 and 239122. While these ambiguous mappings won't disappear completely, majority of them can be fixed by cross-checking the mappings from other sources. We worked hard to improve this mapping and remove discrepancy as much as we could. We'll post more about this soon.
Because some "reporter
" IDs are integers (e.g. Affymetrix HuGene_1-1 array), just like Entrez gene IDs, "reporter
" field now needs to be explicit in the query to avoid any confusion:
http://mygene.info/v3/query?q=reporter:2845421&fields=reporter
The "dot.field" notation is when nested keys are returned using dot, like ["refseq.rna"]
, instead of nested structure, such as ["refseq"]["rna"]
. This behavior can be triggered using dotfield=1
in conjunction with fields
parameters. Default is now results are returned using nested structure, unless dotfield=1
is explicitly specified.
{
"_id": "1017",
"refseq.rna": [
"NM_001290230",
"NM_001798",
"NM_052827",
"XM_011537732"
]
}
{
"_id": "1017",
"_score": 21.731894,
"refseq": {
"rna": [
"NM_001290230.1",
"NM_001798.4",
"NM_052827.3",
"XM_011537732.1"
]
}
}
Note: this change is only for annotation endpoint /gene
. Query endpoint /query
already defaults to nested structure.
We focus on your needs so you're more than welcome to give feedback, comment any of these changes and request more. Again, stay tuned for more about this new version!