Phenotypes¶

Phenotype data containers and readers.

Objects¶

class snputils.PhenotypeObject(samples, values, phenotype_name='PHENO', quantitative=None)[source]¶

Bases: object

Generic phenotype container for single-trait analyses.

The object stores sample IDs, normalized phenotype values, inferred/declared trait type, and binary case/control convenience attributes.

class snputils.MultiPhenotypeObject(phen_df)[source]¶

Bases: object

A class for multi-phenotype data.

This class serves as a container for phenotype data, allowing for operations such as filtering samples and accessing phenotype information. It uses a DataFrame to store the data, with the first column reserved for the sample identifers.

Parameters:: phen_df (pd.DataFrame) – A Pandas DataFrame containing phenotype data, with the first column representing sample identifiers.

property phen_df¶

Retrieve phen_df.

Returns:: pd.DataFrame – A Pandas DataFrame containing phenotype data, with the first column representing sample identifiers.

property n_samples¶

Retrieve n_samples.

Returns:: int – The total number of samples.

property n_phenotypes¶

Retrieve n_phenotypes.

Returns:: int – Number of phenotype columns, excluding the sample identifier column.

property shape¶: Retrieve the shape of the phenotype DataFrame.

copy()[source]¶

Create and return a copy of the current MultiPhenotypeObject instance.

Returns:: MultiPhenotypeObject – A new instance of the current object.

filter_samples(samples=None, indexes=None, include=True, reorder=False, inplace=False)[source]¶

Filter samples in the MultiPhenotypeObject based on sample names or indexes.

This method allows you to include or exclude specific samples by their names, indexes, or both. When both samples and indexes are provided, the union of the specified samples is used. Negative indexes are supported and follow NumPy’s indexing conventions. Set reorder=True to match the ordering of the provided samples and/or indexes lists when including.

Parameters:

samples (str or array_like of str, optional) – Names of the samples to include or exclude. Can be a single sample name or a sequence of sample names. Default is None.
indexes (int or array_like of int, optional) – Indexes of the samples to include or exclude. Can be a single index or a sequence of indexes. Negative indexes are supported. Default is None.
include (bool, default=True) – If True, includes only the specified samples. If False, excludes the specified samples. Default is True.
inplace (bool, default=False) – If True, modifies the object in place. If False, returns a new MultiPhenotypeObject with the samples filtered. Default is False.

Returns:

Optional[MultiPhenotypeObject] – Returns a new MultiPhenotypeObject with the specified samples filtered if inplace=False. If inplace=True, modifies the object in place and returns None.

Readers¶

class snputils.PhenotypeReader(file)[source]¶

Bases: PhenotypeBaseReader

Reader for single-trait phenotype files (any extension; common: .txt, .phe, .pheno).

Expected format (headered, whitespace-delimited):

Must include IID (optionally preceded by FID)
First phenotype column after IID is used by default

property file¶

Retrieve file.

Returns:: pathlib.Path – Path to the file containing phenotype data.

read(phenotype_col=None, quantitative=None)[source]¶

Abstract method to read data from the provided file.

Subclasses must implement this method to read and parse the data. The implementation should construct an instance of snputils.phenotype.genobj.MultiPhenotypeObject or snputils.phenotype.genobj.PhenotypeObject based on the read data.

class snputils.MultiPhenReader(file)[source]¶

Bases: PhenotypeBaseReader

Reader for multi-phenotype data from file (.xlsx, .csv, .tsv, .txt, .phe, .pheno, .map, .smap, .phen), constructing a MultiPhenotypeObject.

Parameters:: file (str or pathlib.Path) – Path to the file containing phenotype data. Accepted formats: .xlsx, .csv, .tsv, .txt, .phe, .pheno, .map, .smap, .phen.

property file¶

Retrieve file.

Returns:: pathlib.Path – Path to the file containing phenotype data. Accepted formats: .xlsx, .csv, .tsv, .txt, .phe, .pheno, .map, .smap, .phen.

read(samples_idx=0, phen_names=None, sep=',', header=0, drop=False)[source]¶

Read data from file and construct a MultiPhenotypeObject.

Parameters:

samples_idx (int, default=0) – Index of the column containing sample identifiers. Default is 0, assuming the first column contains sample identifiers.
phen_names (list of str, optional) – List of phenotype column names. If provided, these columns will be renamed to the specified names.
sep (str, default=',') – The delimiter for separating values in .csv, .tsv, .txt, .phe, .pheno, or .map files. Default is ‘,’; use sep=r’s+’ for whitespace-delimited.
header (int, default=0) – Row index to use as the column names. By default, uses the first row (header=0). Set to None if column names are provided explicitly.
drop (bool, default=False) – If True, removes columns not listed in phen_names (except the samples column).

Returns:

MultiPhenotypeObject – A multi-phenotype object instance.