Visualization¶
Plotting helpers for embeddings, local ancestry, global admixture proportions, and association summaries. Most functions accept in-memory objects or result files with the column names documented in each signature. Scatter plots expect a fitted PCA, mdPCA, or maasMDS model with X_new_ and samples_.
- snputils.visualization.scatter(dimredobj, labels_file, abbreviation_inside_dots=True, arrows_for_titles=False, dots=True, legend=True, color_palette=None, show=True, save_path=None, *, label_mode=None, style='default', figsize=None, label_colors=None, legend_outside=None, despine=None, axis_xlabel=None, axis_ylabel=None, point_size=None, centroid_size=None, point_alpha=None, savefig_kwargs=None, equal_aspect=None)[source]¶
Plot a scatter with group centroids and optional label styling.
- Parameters:
dimredobj – Object produced by a dimensionality-reduction step, e.g.
maasMDS,mdPCA, orPCA. Must exposeX_new_((n, 2)embedding) andsamples_(identifiers aligned with embedding rows).labels_file (str or pandas.DataFrame) – TSV path or in-memory table with columns
indIDandlabel.abbreviation_inside_dots (bool) – If True, show a short acronym inside each centroid marker.
arrows_for_titles (bool) – If True, draw arrows from text labels to centroids.
dots (bool) – If True, draw scatter points; if False, print coordinates and use text markers instead.
legend (bool) – If True, include a legend for group labels.
color_palette (optional) – Colormap or indexable color list; default palette is chosen automatically if None.
show (bool, optional) – If True, call
plt.show(); otherwise close the figure after saving. Default True.save_path (str, optional) – If set, save the figure to this path (
plt.savefig). Prefer.pdfor.svgfor publication: dense scatter is rasterized atdpi(default 300) while axes and text stay vector. Bitmap formats (.png, …) also default to thatdpi. Override viasavefig_kwargs.label_mode (str, optional) – Overrides
abbreviation_inside_dots,arrows_for_titles, andlegend."legend"— legend plus abbreviations inside centroids."acronym"— abbreviations inside centroids only."arrow"— labels near centroids withadjustTextarrows; best for many groups.Nonekeeps the individual boolean flags.style (str) –
"default"— legacy appearance."publication"— typography, despine, room for an outside legend, slightly larger markers, MDS-oriented axis labels.figsize (tuple, optional) – Figure size in inches; chosen from
stylewhen None.label_colors (Mapping, optional) – Map group labels (as in the TSV) to matplotlib color strings; unlisted labels use the palette.
legend_outside (bool, optional) – If True, place the legend outside the axes. Default True when
style=="publication".despine (bool, optional) – Hide top and right spines. Default True when
style=="publication".axis_xlabel (str, optional) – Axis labels; defaults depend on
style.axis_ylabel (str, optional) – Axis labels; defaults depend on
style.point_size (float, optional) – Override scatter sizes and point alpha.
centroid_size (float, optional) – Override scatter sizes and point alpha.
point_alpha (float, optional) – Override scatter sizes and point alpha.
savefig_kwargs (dict, optional) – Extra keyword arguments for
plt.savefigwhensave_pathis set.equal_aspect (bool, optional) – If True, equal data aspect (typical for MDS/PCA). Default True when
style="publication".
- Returns:
None
- snputils.visualization.lai.plot_lai(laiobj, colors, sort=True, figsize=None, legend=False, legend_kwargs=None, title=None, fontsize=None, scale=2)[source]¶
Plot LAI (Local Ancestry Inference) data with customizable options. Each row represents the ancestry of a sample at the window level, distinguishing between maternal and paternal strands. Whitespace is used to separate individual samples.
- Parameters:
laiobj – A LocalAncestryObject containing LAI data.
colors – A dictionary with ancestry-color mapping.
sort – If True, sort samples based on the most frequent ancestry. Samples are displayed with the most predominant ancestry first, followed by the second most predominant, and so on. Defaults to True.
figsize – Figure size. If is None, the figure is displayed with a default size of (25, 25). Defaults to None.
legend – If True, display a legend. If
sort==True, ancestries in the legend are sorted based on their total frequency in descending order. Defaults to False.legend_kwargs – Optional keyword arguments passed through to
Axes.legend. Defaults keep the legend centered below the x-axis label.title – Title for the plot. If None, no title is displayed. Defaults to None.
fontsize – Font sizes for various plot elements. If None, default font sizes are used. Defaults to None.
scale – Number of times to duplicate rows for enhanced vertical visibility. Defaults to 2.
- snputils.visualization.admixture.reorder_admixture(Q_mat)[source]¶
Reorder Q_mat rows so that rows are grouped by each sample’s dominant ancestry, and columns are sorted by descending average ancestry proportion.
- snputils.visualization.admixture.plot_admixture(ax, Q_mat_sorted, boundary_list, col_order=None, colors=None, show_boundaries=True, show_axes_labels=True, show_ticks=True, set_limits=True, minimal=False)[source]¶
Plot a structure-style bar chart of Q_mat_sorted in the given Axes ax. If colors is not None, it should be a list or array of length K. If col_order is not None, colors are reordered according to col_order.
Optional controls: - show_boundaries (bool): draw vertical lines at group boundaries. Default True. - show_axes_labels (bool): set X/Y axis labels. Default True. - show_ticks (bool): show axis ticks. Default True. - set_limits (bool): set xlim and ylim to [0, n_samples-1] and [0,1]. Default True. - minimal (bool): if True, overrides to disable boundaries, labels, ticks, limits and hides spines.
- snputils.visualization.manhattan_plot.manhattan_plot(data, colors=None, significance_threshold=0.05, point_size=7.0, line_width=1.0, line_color='r', figsize=None, title=None, fontsize=None, save=None, output_filename=None)[source]¶
Generate a Manhattan plot from association study results.
Accepts either a file path or an in-memory
pandas.DataFrame. The input must contain columns#CHROM,POS, andP(p-values).- Parameters:
data – Path to a tab-separated results file or an in-memory
DataFramewith columns#CHROM,POS, andP. PLINK2-style output files are supported directly.colors – List of colors to apply per chromosome. The chromosome number modulo
len(colors)is used to select the color. Defaults to["black", "grey"].significance_threshold – Nominal significance threshold used to derive the Bonferroni-corrected threshold (
significance_threshold / n_variants). Default is 0.05.point_size – Marker area for scatter points (matplotlib
s). Default is 7.0.line_width – Width of the Bonferroni reference line. Default is 1.0.
line_color – Color of the Bonferroni reference line. Default is
"r".figsize – Optional
(width, height)tuple passed tomatplotlib.pyplot.figure(). Defaults to(12, 6)(2:1 aspect ratio).title – Plot title. Default is
None(no title).fontsize – Mapping with optional keys
'title','xlabel', and'ylabel'controlling font sizes. Missing keys fall back to sensible defaults (20 for title, 15 for axis labels).save – If
True, saves the figure tooutput_filename.output_filename – Destination path for the saved figure (
.pdf,.svg,.png, …).
- snputils.visualization.qq_plot.qq_plot(data, color='black', significance_threshold=0.05, point_size=7.0, line_width=1.0, expected_line_color='red', threshold_line_color='orange', figsize=None, title=None, fontsize=None, save=None, output_filename=None)[source]¶
Generate a quantile-quantile (QQ) plot of association study p-values.
Plots observed
-log10(p)against the expected-log10(p)under the null hypothesis of no association (uniform distribution), together with the identity reference line and a Bonferroni significance threshold.Accepts either a file path or an in-memory
pandas.DataFrame. The input must contain a columnPwith p-values.- Parameters:
data – Path to a tab-separated results file or an in-memory
DataFramewith a columnP. PLINK2-style output files are supported directly.color – Color for the scatter points. Defaults to
"black".significance_threshold – Nominal significance threshold used to derive the Bonferroni-corrected threshold (
significance_threshold / n_variants). Default is 0.05.point_size – Marker area for scatter points (matplotlib
s). Default is 7.0.line_width – Width of the expected-null and Bonferroni reference lines. Default is 1.0.
expected_line_color – Color of the identity (expected under null) reference line. Default is
"red".threshold_line_color – Color of the Bonferroni threshold line. Default is
"orange".figsize – Optional
(width, height)tuple passed tomatplotlib.pyplot.figure().title – Plot title. Default is
None(no title).fontsize – Mapping with optional keys
'title','xlabel', and'ylabel'controlling font sizes. Missing keys fall back to sensible defaults (20 for title, 15 for axis labels).save – If
True, saves the figure tooutput_filename.output_filename – Destination path for the saved figure (
.pdf,.svg,.png, …).
- snputils.visualization.admixture_viz.pong_viz(folder_runs, output_dir, k=None, min_k=None, max_k=None, runs=None, run_prefix='train', ind2pop_path=None, pop_names_path=None, color_list_path=None, verbose=False)[source]¶
Executes Pong visualization with the specified parameters.
- snputils.visualization.admixture_viz.create_filemap(folder, k=None, min_k=None, max_k=None, runs=None, run_prefix='train_demo')[source]¶
Creates a filemap for training files organized by k values and runs and saves it to a file.
- Parameters:
folder (str) – Base folder path
k (Optional[int]) – Single k value to process. If specified, min_k and max_k are ignored
min_k (Optional[int]) – Minimum k value for range processing
max_k (Optional[int]) – Maximum k value for range processing
runs (List[int]) – List of run numbers
run_prefix (str) – Prefix for the run files (default: ‘train_demo’)
- Returns:
str – Path to saved file
- Raises:
FileMapError – If invalid parameters are provided or if configuration is incorrect
- snputils.visualization.chromosome_painting(source, output_dir, sample_id=None, build='hg38', color_map=None, num_labels=8, fill_empty=True, fill_marker_gaps=False, output_format='png', force=True, verbose=False, show=False, keep_bed_files=False)[source]¶
Generate chromosome paintings from a local ancestry source.
Accepts a
LocalAncestryObject, one or more MSP files, or one or more BED files and dispatches to the appropriate internal pipeline automatically.Source types
LocalAncestryObject— in-memory LAI data;chromosomesandphysical_posmust be populated.str/pathlib.Pathending with.mspor.msp.tsv— a single MSP file; also accepts alistof such paths spanning multiple chromosomes.str/pathlib.Pathending with.bed— one pre-formatted BED file; also accepts alistto paint multiple samples at once.
Selecting samples
sample_id=None(default) — paint every sample in the source.sample_id="0001"— paint only the sample whose ID is"0001".sample_id=["0001", "0002"]— paint a subset.
sample_idis not applicable to BED sources (a BED file already represents one sample); it is silently ignored when BED files are provided.- Parameters:
source – The data source; see description above.
output_dir – Directory where output files will be saved.
sample_id – Sample identifier(s) to paint.
Nonepaints all samples. Accepts a single string or a list of strings.build – Genome build version (
'hg37'or'hg38').color_map – A TSV filename or a
{int: hex_color}dict mapping numeric ancestry codes to hex color strings. Uses the default snputils palette whenNone.num_labels – Number of distinct colors to generate when color_map is
None.fill_empty – If True, fill unassigned chromosome regions with a neutral grey color.
fill_marker_gaps – If True, extend painted segments through inter-marker gaps until the next segment on the same chromosome copy. This avoids rendering sparse marker intervals as missing ancestry. Defaults to False.
output_format – Output format,
'png'or'pdf'.force – If True, overwrite existing output files.
verbose – If True, emit progress log messages.
show – If True, display each PNG in a matplotlib figure (PNG only).
keep_bed_files – If True, retain intermediate BED files generated from MSP sources.
- Returns:
List[str] – Paths to the generated output files, one per sample.
- Raises:
ValueError – If the source type cannot be determined from the file extension, or if a requested sample_id is not found.
Examples
Paint all samples from a LAI object:
su.viz.chromosome_painting(lai, "paintings/")
Paint a single sample:
su.viz.chromosome_painting(lai, "paintings/", sample_id="0001")
Paint a subset from MSP files:
su.viz.chromosome_painting( ["chr1.msp", "chr2.msp"], "paintings/", sample_id=["0001", "0002"], )