Tutorial on the MultiPhenotypeObject Functionalities¶
import os
import sys
import numpy as np
import pandas as pd
dir = os.path.abspath('../')
if not dir in sys.path: sys.path.append(dir)
from snputils.phenotype.io.read import MultiPhenReader
1. Read a TSV/MAP File into a MultiPhenotypeObject¶
Load a phenotype file (e.g., a TSV/MAP file) into a MultiPhenotypeObject, which stores phenotype data in a structured DataFrame.
# Path to the phenotype file
path = '../data/samples_pops.tsv'
# Read the file into a MultiPhenotypeObject with specified delimiter, no header, and a phenotype name
phenobj = MultiPhenReader(path).read(sep='\t', header=None, phen_names=['ancestry'])
# Display the DataFrame containing phenotype data
phenobj.phen_df
| samples | ancestry | |
|---|---|---|
| 0 | HG00096 | EUR |
| 1 | HG00097 | EUR |
| 2 | HG00099 | AFR |
| 3 | HG00100 | AFR |
2. Filter MultiPhenotypeObject by Samples¶
The filter_samples() method allows you to filter the phenotype data by sample names or sample indexes. You can include or exclude specific samples based on your criteria.
2.1. Filter by Sample Names¶
Include specific samples by their names.
phenobj.filter_samples(samples=['HG00096', 'HG00097']).phen_df
| samples | ancestry | |
|---|---|---|
| 0 | HG00096 | EUR |
| 1 | HG00097 | EUR |
2.2. Filter by Sample Indexes¶
Exclude specific samples by their indexes in the data.
filtered_phen_df_exclude = phenobj.filter_samples(indexes=[0, 3], include=False).phen_df
filtered_phen_df_exclude
| samples | ancestry | |
|---|---|---|
| 0 | HG00097 | EUR |
| 1 | HG00099 | AFR |