Tutorial on admixture mapping

This demo runs admixture mapping using the internal snputils.tools.admixture_mapping pipeline: MSP file and phenotype file in, regression results and a compressed TSV out.

import os
import sys
import logging

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

repo_dir = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath("")), ".."))
if repo_dir not in sys.path:
    sys.path.insert(0, repo_dir)

from snputils.tools.admixture_mapping import run_admixture_mapping
from snputils.visualization.admixture_manhattan_plot import manhattan_plot

1. Define paths

Paths to the MSP file (local ancestry inference output), the phenotype file (whitespace-delimited, header with IID and one phenotype column), optional covariates, and the directory where results will be written.

msp_path = "data.msp"
phe_path = "pheno.phe"
covar_path = "covariates.covar"
results_dir = "out"
phe_id = "trait1"

2. Run admixture mapping

The internal tool streams the MSP file, aligns samples with the phenotype file, and runs logistic (binary) or linear (quantitative) regression per ancestry window. Trait type is inferred from the phenotype column. Results are written to a compressed TSV under results_path and returned as a DataFrame when return_results=True.

df_results = run_admixture_mapping(
    phe_path=phe_path,
    msp_path=msp_path,
    results_path=results_dir,
    phe_id=phe_id,
    batch_size=4096,
    keep_hla=True,
    return_results=True,
    quantitative=True,
    verbose=True,
    covar_path=covar_path,
    covar_variance_standardize=True,
)
df_results.head()
Reading MSP file...
  Chunk 0: 3000 windows (total so far: 3,000)
  Done. Processed 3,000 windows.
INFO:snputils.tools.admixture_mapping:Admixture mapping results written to out/trait1_admixmap.tsv.gz
#CHROM POS END ID REF ALT A1 ANCESTRY TEST OBS_CT BETA SE T_STAT P ERRCODE
0 1 1 1000 w1_ANC0 N ANC0 ANC0 ANC0 LINEAR 800 -0.000478 0.035319 -0.013537 0.989202 .
1 1 1001 2000 w2_ANC0 N ANC0 ANC0 ANC0 LINEAR 800 0.029272 0.035176 0.832174 0.405561 .
2 1 2001 3000 w3_ANC0 N ANC0 ANC0 ANC0 LINEAR 800 0.029818 0.034677 0.859870 0.390121 .
3 1 3001 4000 w4_ANC0 N ANC0 ANC0 ANC0 LINEAR 800 0.027226 0.034861 0.780986 0.435044 .
4 1 4001 5000 w5_ANC0 N ANC0 ANC0 ANC0 LINEAR 800 0.027656 0.034745 0.795983 0.426280 .

Output is written to {results_path}/{phe_id}_admixmap.tsv.gz (or the path you passed if it already ends with .tsv.gz). The TSV contains columns #CHROM, POS, P, ANCESTRY, and others, suitable for plotting.

output_tsv = f"{results_dir}/{phe_id}_admixmap.tsv.gz"
output_tsv
'out/trait1_admixmap.tsv.gz'

3. Manhattan plot

Visualize association p-values from the internal admixture mapping output (TSV with #CHROM, POS, P). pd.read_csv supports gzipped TSV.

chromosome_colors = [
    "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b",
    "#e377c2", "#7f7f7f", "#bcbd22", "#17becf", "#aec7e8", "#ffbb78",
    "#98df8a", "#ff9896", "#c5b0d5", "#c49c94", "#f7b6d2", "#c7c7c7",
    "#dbdb8d", "#9edae5", "#ad494a", "#8c6d31",
]

manhattan_plot(
    input_file=output_tsv,
    colors=chromosome_colors,
    significance_threshold=0.05,
    figsize=(12, 8),
    title="Admixture mapping (internal)",
    fontsize={"title": 20, "xlabel": 15, "ylabel": 15, "legend": 15},
    save=True,
    output_filename=f"{results_dir}/manhattan_plot.png",
)
../_images/98148feb1ce63504ddf0dc8cc20c00d93b7024474437c3f746460dc95a8c62ac.png