Accessing BioMart with REST API and multi-threading (Python3)

BioMart is an amazing resource of well curated genomic annotations – till you need to actually download data programatically…

I gave it a try for a couple of hours using the biomaRt R package only to realise my query wouldn’t be served in our lifetime…

However, I then moved on to try using Biomart’s REST API.

That’s a very decent option (in case you are not a Perl user who wants to use the native Perl API of BioMart) and it’s pretty fast (relative to biomaRt always) and robust.

For any bulk data queries, you can wrap your requests over chunks of data or individual IDs each time.

For example, in order to retrieve the full annotation for a list of GO IDs using Python3, you can iterate through each GO ID:

import requests, sys

server = "https://rest.ensembl.org"
ext_prefix = "/ontology/id/"

def get_go_term_by_id(id):

    ext = ext_prefix + id + "?content-type=application/json"
    r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    decoded = r.json()
    print(repr(decoded))

go_ids = ['GO:0006958', 'GO:0031902', 'GO:0050776']
for id in go_ids:
    get_go_term_by_id(id)

Each individual call gets served in less than a second (usually a few millisecs) so no need to worry for any server timeout issues.

Besides, you can wrap your code in try/except blocks so that in case an individual call crashes it can continue processing the rest as normal.

Simple multi-threading with map and pool in Python3

For real speed performance, you can submit multiple requests at once to BioMart using parallel threads in Python (from the multiprocessing module):

from multiprocessing.dummy import Pool as ThreadPool

# making 50 simultaneous requests
num_threads = 50
pool = ThreadPool(num_threads) 

server = "https://rest.ensembl.org"
ext_prefix = "/ontology/id/"

go_id_terms_dict = {}

def get_go_term_by_id(id):

    try:
        ext = ext_prefix + id + "?content-type=application/json"
        r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

        if not r.ok:
            r.raise_for_status()
            return ""

        decoded = r.json()
        go_term = repr(decoded['name'])
        go_term = go_term.replace("'", "")
        go_term = go_term.replace("\"", "")

    except:
        go_term = ""
        print('[Warning] Could not fetch GO term for ID:', id)

    go_id_terms_dict[id] = go_term
    result = id + '||' + go_term
    print(result)

# all_human_go_ids: your list with GO IDs
pool.map(get_go_term_by_id, all_human_go_ids)

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s