Genotype Representation Graphs (GRG)¶
This notebook demonstrates the current GRG workflow in snputils: generate a small GRGObject, convert it to a SNPObject, run ordinary SNPObject operations, and save/read the graph.
Build a toy GRG¶
build_synthetic_grg() returns a deterministic GRG with six haploid sample nodes, representing three diploid individuals, and five mutations.
from pathlib import Path
import numpy as np
from snputils import read_grg
from snputils.datasets import build_synthetic_grg
RESULTS_DIR = Path("results/tutorials/grg")
RESULTS_DIR.mkdir(parents=True, exist_ok=True)
grgobj = build_synthetic_grg()
grgobj
GRGObject(shape=(5, 6), n_snps=5, n_haplotypes=6, n_samples=3, filename=None, mutable=True, loaded=True)
Convert GRG to SNPObject¶
to_snpobject(sum_strands=False) returns phased genotypes with one column per diploid sample and one haplotype axis. sum_strands=True returns 2D diploid dosages.
snp_phased = grgobj.to_snpobject(chrom="22", sum_strands=False)
snp_summed = grgobj.to_snpobject(chrom="22", sum_strands=True)
print(snp_phased)
print("phased shape:", snp_phased.genotypes.shape)
print("summed shape:", snp_summed.genotypes.shape)
print(snp_summed.genotypes)
SNPObject(shape=(5, 3, 2), n_snps=5, n_samples=3, genotypes_shape=(5, 3, 2), calldata_lai_shape=None, calldata_gp_shape=None, has_variant_metadata=True, has_ancestry_map=False)
phased shape: (5, 3, 2)
summed shape: (5, 3)
[[2 2 2]
[2 1 0]
[0 1 2]
[1 0 0]
[0 0 1]]
Add sample metadata¶
The converted SNPObject can be annotated and used like any other SNPObject.
snp_phased.samples = np.array(["S1", "S2", "S3"])
snp_phased.sample_fid = np.array(["LEFT", "LEFT", "RIGHT"])
af, called = snp_phased.allele_freq(sample_labels=snp_phased.sample_fid, return_counts=True, as_dataframe=True)
display(af)
display(called)
| LEFT | RIGHT | |
|---|---|---|
| 0 | 1.00 | 1.0 |
| 1 | 0.75 | 0.0 |
| 2 | 0.25 | 1.0 |
| 3 | 0.25 | 0.0 |
| 4 | 0.00 | 0.5 |
| LEFT | RIGHT | |
|---|---|---|
| 0 | 4 | 2 |
| 1 | 4 | 2 |
| 2 | 4 | 2 |
| 3 | 4 | 2 |
| 4 | 4 | 2 |
Save and reload GRG¶
GRGObject.save() writes a .grg file. snputils.read_grg() reads it back through the same public API.
grg_path = RESULTS_DIR / "toy.grg"
grgobj.save(str(grg_path))
reloaded = read_grg(grg_path, mutable=True)
print(reloaded)
print(reloaded.to_snpobject(chrom="22", sum_strands=True).genotypes)
GRGObject(shape=(5, 6), n_snps=5, n_haplotypes=6, n_samples=3, filename='/private/home/dbonet/git/snputils/results/tutorials/grg/toy.grg', mutable=True, loaded=True)
[[2 2 2]
[2 1 0]
[0 1 2]
[1 0 0]
[0 0 1]]