Data ScienceIntermediateCourse

Personal genome bioinformatics with AI

Turn raw sequencing reads or your 23andMe download into clean alignments, variant calls, and personal insights — with an LLM as your bioinformatics tutor.

210 minClaude, minimap2, BWA, samtools, bcftools, snpEff, Python10xCareer Team

Favorite

Choose your training style

Pick the format that matches the level of support you want.

Self-pacedAvailable

Self-paced

Start immediately and work through the training on your own schedule.

Free

Start self-paced

Human trainerComing soon

Human trainer

Join a guided cohort or workshop format when live delivery is available.

$99

Guided by an instructor

AI trainerComing soon

AI trainer

Practice with an AI-guided trainer experience tailored to the course topic.

Personalized guidance

Overview

Sequencing your genome is only half the project. The other half is turning FASTQ files and SNP tables into something you can actually reason about: ancestry, pharmacogenomics, trait associations, and how your reads compare to reference data. This course teaches you the standard bioinformatics stack, then uses an LLM to demystify the commands and the outputs.

Who it's for

Graduates of the vibe genomics course who now have their own reads
23andMe or AncestryDNA customers with raw data they have never opened
Developers curious about genomics tooling and pipelines
Students learning bioinformatics outside a formal program

What you'll build

A reproducible pipeline: raw reads → alignment → variant calls → annotated VCF
A comparison between your personal VCF and a consumer SNP chip file
A small personal dashboard highlighting pharmacogenomic and trait-relevant variants

Prerequisites

Comfort with a Unix shell and installing command-line tools
Either your own sequencing reads or a 23andMe/AncestryDNA raw data file
A laptop or small server with enough disk for reference data

Tools and setup

Install a standard toolchain (minimap2 or BWA, samtools, bcftools, snpEff)
Download the GRCh38 reference and a curated clinical variant database
Keep an LLM session open as a tutor for every new command and file format

Modules

Module 1: File formats and the standard pipeline

You will learn FASTQ, BAM, VCF, and related formats by actually running the pipeline end to end on a small test dataset.

Module 2: Variant calling and annotation

You will call variants against a reference, annotate them with functional impact, and filter to the ones worth paying attention to.

Module 3: Interpret responsibly

You will compare your results to consumer SNP chips, look up pharmacogenomic and trait variants in public databases, and clearly separate "interesting" from "clinically actionable."

Deliverable

A reproducible bioinformatics pipeline and an annotated personal variant report with a clear boundary between research-grade curiosity and clinical decisions.

Common mistakes

Treating aggregated consumer reports as the ground truth
Ignoring reference build differences (GRCh37 vs GRCh38) and getting phantom discrepancies
Drawing medical conclusions from single variants with no clinical context

Next steps

Layer in polygenic risk scores, ancestry deconvolution, or long-read structural variant detection from your nanopore data.