Genotype Representation Graphs (GRG)

This notebook demonstrates the current GRG workflow in snputils: generate a small GRGObject, convert it to a SNPObject, run ordinary SNPObject operations, and save/read the graph.

Build a toy GRG

build_synthetic_grg() returns a deterministic GRG with six haploid sample nodes, representing three diploid individuals, and five mutations.

from pathlib import Path

import numpy as np

from snputils import read_grg
from snputils.datasets import build_synthetic_grg

RESULTS_DIR = Path("results/tutorials/grg")
RESULTS_DIR.mkdir(parents=True, exist_ok=True)

grgobj = build_synthetic_grg()
grgobj
GRGObject(shape=(5, 6), n_snps=5, n_haplotypes=6, n_samples=3, filename=None, mutable=True, loaded=True)

Convert GRG to SNPObject

to_snpobject(sum_strands=False) returns phased genotypes with one column per diploid sample and one haplotype axis. sum_strands=True returns 2D diploid dosages.

snp_phased = grgobj.to_snpobject(chrom="22", sum_strands=False)
snp_summed = grgobj.to_snpobject(chrom="22", sum_strands=True)

print(snp_phased)
print("phased shape:", snp_phased.genotypes.shape)
print("summed shape:", snp_summed.genotypes.shape)
print(snp_summed.genotypes)
SNPObject(shape=(5, 3, 2), n_snps=5, n_samples=3, genotypes_shape=(5, 3, 2), calldata_lai_shape=None, calldata_gp_shape=None, has_variant_metadata=True, has_ancestry_map=False)
phased shape: (5, 3, 2)
summed shape: (5, 3)
[[2 2 2]
 [2 1 0]
 [0 1 2]
 [1 0 0]
 [0 0 1]]

Add sample metadata

The converted SNPObject can be annotated and used like any other SNPObject.

snp_phased.samples = np.array(["S1", "S2", "S3"])
snp_phased.sample_fid = np.array(["LEFT", "LEFT", "RIGHT"])

af, called = snp_phased.allele_freq(sample_labels=snp_phased.sample_fid, return_counts=True, as_dataframe=True)
display(af)
display(called)
LEFT RIGHT
0 1.00 1.0
1 0.75 0.0
2 0.25 1.0
3 0.25 0.0
4 0.00 0.5
LEFT RIGHT
0 4 2
1 4 2
2 4 2
3 4 2
4 4 2

Save and reload GRG

GRGObject.save() writes a .grg file. snputils.read_grg() reads it back through the same public API.

grg_path = RESULTS_DIR / "toy.grg"
grgobj.save(str(grg_path))
reloaded = read_grg(grg_path, mutable=True)

print(reloaded)
print(reloaded.to_snpobject(chrom="22", sum_strands=True).genotypes)
GRGObject(shape=(5, 6), n_snps=5, n_haplotypes=6, n_samples=3, filename='/private/home/dbonet/git/snputils/results/tutorials/grg/toy.grg', mutable=True, loaded=True)
[[2 2 2]
 [2 1 0]
 [0 1 2]
 [1 0 0]
 [0 0 1]]