The battle for Greece's next top model may have been over for this year...😂However, the battle's still on in almost every machine learning task a data scientist comes across in their daily -and nightly- lives. The long standing questions in these cases are: Which model is the best?Can I know that in advance? If not, … Continue reading Next top model abstractions
Ever had an encounter with the most...awkward programming language? Surely, awk may have a slightly awkward name, possibly a bit awkward syntax at times but overall it's far from being considered awkward as a language. In fact, it is one of the most powerful tools any programmer should possess to nail simple or more complex … Continue reading Simple stats in an Awk-ward fashion
Happy 2019 everyone!! New year, new hopes and a new journey ahead towards minimising a loss function to reach our goals throughout the year! If only our journey were a multivariate continuous function, things would be so much easier...! Our life may be way more complex than that, but fortunately, there is a field where … Continue reading Happy New Year optimisation with Gradient Descent!
BioMart is an amazing resource of well curated genomic annotations - till you need to actually download data programatically...I gave it a try for a couple of hours using the biomaRt R package only to realise my query wouldn't be served in our lifetime...However, I then moved on to try using Biomart's REST API.That's a … Continue reading Accessing BioMart with REST API and multi-threading (Python3)
In today's post, we'll just be doing some...simple counting with Genotype Counts (GC) at a Cambridge pub. 😎 The general form of GC with a ref allele A and multiple alternative alleles (B,C,D, etc.) is: GC = AA, AB,BB, AC,BC,CC, AD,BD,CD,DD, ... GC fields basically capture the likelihood of two events occurring simultaneously at two spots. … Continue reading Genotype counts & sports games
VCF Playground - Level 2 Following my previous post on GC elements' order, I'm now going to present an empirical proof of this convention! As a reminder, the inferred order of elements within a GC field is: GC=AA, AB,BB, AC,BC,CC, AD,BD,CD,DD, AE,BE,CE,DE,EE, ... (1) Notation used: Reference Allele: A Alternative Alleles: B, C, D, ... (in order of appearance in the VCF … Continue reading gnomAD: expanding multi-allelic variants in VCF (Part 2)
VCF playground - Level 1 TLDR; Order of elements in GC field of gnomAD VCF for multi-allelic entries: GC = AA, AB, BB, AC, BC, CC, AD, BD, CD, DD, AE, BE, CE, DE, EE, ... Ref. allele: A Alt. alleles: B, C, D, ... Ever tried to make sense of the infamous VCF format? … Continue reading gnomAD: expanding multi-allelic variants in VCF (Part 1)