Tutorial on the MultiPhenotypeObject Functionalities

import os
import sys
import numpy as np
import pandas as pd

dir = os.path.abspath('../')
if not dir in sys.path: sys.path.append(dir)

from snputils.phenotype.io.read import MultiPhenReader

1. Read a TSV/MAP File into a MultiPhenotypeObject

Load a phenotype file (e.g., a TSV/MAP file) into a MultiPhenotypeObject, which stores phenotype data in a structured DataFrame.

# Path to the phenotype file
path = '../data/samples_pops.tsv'

# Read the file into a MultiPhenotypeObject with specified delimiter, no header, and a phenotype name
phenobj = MultiPhenReader(path).read(sep='\t', header=None, phen_names=['ancestry'])

# Display the DataFrame containing phenotype data
phenobj.phen_df
samples ancestry
0 HG00096 EUR
1 HG00097 EUR
2 HG00099 AFR
3 HG00100 AFR

2. Filter MultiPhenotypeObject by Samples

The filter_samples() method allows you to filter the phenotype data by sample names or sample indexes. You can include or exclude specific samples based on your criteria.

2.1. Filter by Sample Names

Include specific samples by their names.

phenobj.filter_samples(samples=['HG00096', 'HG00097']).phen_df
samples ancestry
0 HG00096 EUR
1 HG00097 EUR

2.2. Filter by Sample Indexes

Exclude specific samples by their indexes in the data.

filtered_phen_df_exclude = phenobj.filter_samples(indexes=[0, 3], include=False).phen_df
filtered_phen_df_exclude
samples ancestry
0 HG00097 EUR
1 HG00099 AFR