The backend engine responsible for aligning noisy analog biological data with 1.6M digital genetic markers.

The Engineering Deep Dive

Genotypica is the algorithmic bridge between raw hardware signals and genetic truth. Starting with sending out sample kits to people who order a kit we received swabs for processing. When a biological sample is scanned, the resulting data is a noisy, non-linear analog signal. Because biological reactions vary based on temperature and sample purity, these signals are rarely uniform in time. Once aligned, the engine maps the signals to a massive local database of 1.6 million rsid markers. From there the markers are matched to publications pulled from dbSNP and ClinVar, to provide accurate genotyping results. This data is then sent to the user using a webportal to access an easy to read pdf describing their results.

The Technical Post Mortem

Genotypica was more of programming challenge then a biological one. The core challenges was signal alignment, large data sized processing, and data simplification for end users. I implemented a calling algorithm to process the raw data produced from Illumina machines that would then match with known rsids. By aligning these rsids the engine could accurately cross reference results against a large database I created of diseases. This involved transforming massive non-uniform datasets into highly optimized search structures.

Engineering Constraints

Solving for the 'Impossible' means navigating rigid physical and computational limits:

  • Normalizing inconsistent graphs of biological samples.
  • Processing massive datasets from (dbSNP/ClinVar) into searchable formats.
  • Reducing false positive matches in noisy analog data environments.