Personal genome bioinformatics with AI
Turn raw sequencing reads or your 23andMe download into clean alignments, variant calls, and personal insights — with an LLM as your bioinformatics tutor.
Choose your training style
Pick the format that matches the level of support you want.
Self-paced
Start immediately and work through the training on your own schedule.
Human trainer
Join a guided cohort or workshop format when live delivery is available.
Guided by an instructor
AI trainer
Practice with an AI-guided trainer experience tailored to the course topic.
Personalized guidance
Overview
Sequencing your genome is only half the project. The other half is turning FASTQ files and SNP tables into something you can actually reason about: ancestry, pharmacogenomics, trait associations, and how your reads compare to reference data. This course teaches you the standard bioinformatics stack, then uses an LLM to demystify the commands and the outputs.
Who it's for
- Graduates of the vibe genomics course who now have their own reads
- 23andMe or AncestryDNA customers with raw data they have never opened
- Developers curious about genomics tooling and pipelines
- Students learning bioinformatics outside a formal program
What you'll build
- A reproducible pipeline: raw reads → alignment → variant calls → annotated VCF
- A comparison between your personal VCF and a consumer SNP chip file
- A small personal dashboard highlighting pharmacogenomic and trait-relevant variants
Prerequisites
- Comfort with a Unix shell and installing command-line tools
- Either your own sequencing reads or a 23andMe/AncestryDNA raw data file
- A laptop or small server with enough disk for reference data
Tools and setup
- Install a standard toolchain (minimap2 or BWA, samtools, bcftools, snpEff)
- Download the GRCh38 reference and a curated clinical variant database
- Keep an LLM session open as a tutor for every new command and file format
Modules
Module 1: File formats and the standard pipeline
You will learn FASTQ, BAM, VCF, and related formats by actually running the pipeline end to end on a small test dataset.
Module 2: Variant calling and annotation
You will call variants against a reference, annotate them with functional impact, and filter to the ones worth paying attention to.
Module 3: Interpret responsibly
You will compare your results to consumer SNP chips, look up pharmacogenomic and trait variants in public databases, and clearly separate "interesting" from "clinically actionable."
Deliverable
A reproducible bioinformatics pipeline and an annotated personal variant report with a clear boundary between research-grade curiosity and clinical decisions.
Common mistakes
- Treating aggregated consumer reports as the ground truth
- Ignoring reference build differences (GRCh37 vs GRCh38) and getting phantom discrepancies
- Drawing medical conclusions from single variants with no clinical context
Next steps
Layer in polygenic risk scores, ancestry deconvolution, or long-read structural variant detection from your nanopore data.