Quickstart¶
The top-level package exposes the most common readers, data containers, and analysis helpers. A typical workflow loads genotypes, optional ancestry and phenotypes, runs analyses, and plots results:
import snputils as su
snp = su.read_snp("cohort.vcf.gz") # VCF, BCF, BGEN, BED, PGEN, ...
snp = snp.filter_biallelic_variants()
snp.save("cohort.pgen") # convert VCF -> PLINK2
lai = su.read_lai("local_ancestry.msp") # MSP or FLARE local ancestry
adm = su.read_admixture("admixture") # global ancestry (ADMIXTURE)
labels = su.read_labels("labels.tsv") # sample metadata for plots
pheno = su.read_pheno("phenotypes.tsv", col="trait")
ibd = su.read_ibd("hap.ibd")
af = snp.allele_freq() # per-SNP allele frequencies
pcs = su.PCA(n_components=2).fit_transform(snp)
afr_af = snp.allele_freq(ancestry="AFR", laiobj=lai) # ancestry-specific AF
gwas = su.run_gwas(pheno, snp)
admix = su.run_admixture_mapping(pheno, lai)
su.viz.scatter(pcs, labels, save_path="pca.png", show=False)
su.viz.chromosome_painting(lai, "chr_paintings/")
su.viz.qq_plot(gwas)
su.viz.manhattan_plot(admix)
read_snp dispatches from the file extension and returns a snputils.SNPObject. Explicit readers are available when you need format-specific options:
bed = su.read_bed("cohort.bed")
bcf = su.read_bcf("cohort.bcf")
bgen = su.read_bgen("cohort.bgen") # probabilities in calldata_gp
pgen = su.read_pgen("cohort.pgen")
vcf = su.read_vcf("cohort.vcf.gz")
Sample labels¶
snputils.read_labels() loads a TSV with indID and label columns. The table is accepted anywhere a labels path is expected—for example, pass the returned DataFrame directly to scatter().
Next steps¶
Object internals and filtering — Data Model
Format tables and writer options — File I/O
Analysis and CLI — Analysis
End-to-end notebooks — Tutorials