Ancestry¶
Objects, readers, and writers for local and global ancestry data.
Objects¶
- class snputils.LocalAncestryObject(haplotypes, lai, samples=None, ancestry_map=None, window_sizes=None, centimorgan_pos=None, chromosomes=None, physical_pos=None)[source]¶
Bases:
AncestryObjectA class for window-level Local Ancestry Inference (LAI) data.
- Parameters:
haplotypes (list of str of length n_haplotypes) – A list of unique haplotype identifiers.
lai (array of shape (n_windows, n_haplotypes)) – A 2D array containing local ancestry inference values, where each row represents a genomic window, and each column corresponds to a haplotype phase for each sample.
samples (list of str of length n_samples, optional) – A list of unique sample identifiers.
ancestry_map (dict of str to str, optional) – A dictionary mapping ancestry codes to region names.
window_sizes (array of shape (n_windows,), optional) – An array specifying the number of SNPs in each genomic window.
centimorgan_pos (array of shape (n_windows, 2), optional) – A 2D array containing the start and end centimorgan positions for each window.
chromosomes (array of shape (n_windows,), optional) – An array with chromosome numbers corresponding to each genomic window.
physical_pos (array of shape (n_windows, 2), optional) – A 2D array containing the start and end physical positions for each window.
- property haplotypes¶
Retrieve haplotypes.
- Returns:
list of length n_haplotypes – A list of unique haplotype identifiers.
- property lai¶
Retrieve lai.
- Returns:
array of shape (n_windows, n_haplotypes) – A 2D array containing local ancestry inference values, where each row represents a genomic window, and each column corresponds to a haplotype phase for each sample.
- property samples¶
Retrieve samples.
- Returns:
list of str – A list of unique sample identifiers.
- property ancestry_map¶
Retrieve ancestry_map.
- Returns:
dict of str to str – A dictionary mapping ancestry codes to region names.
- property window_sizes¶
Retrieve window_sizes.
- Returns:
array of shape (n_windows,) – An array specifying the number of SNPs in each genomic window.
- property centimorgan_pos¶
Retrieve centimorgan_pos.
- Returns:
array of shape (n_windows, 2) – A 2D array containing the start and end centimorgan positions for each window.
- property chromosomes¶
Retrieve chromosomes.
- Returns:
array of shape (n_windows,) – An array with chromosome numbers corresponding to each genomic window.
- property physical_pos¶
Retrieve physical_pos.
- Returns:
array of shape (n_windows, 2) – A 2D array containing the start and end physical positions for each window.
- property n_samples¶
Retrieve n_samples.
- Returns:
int – The total number of samples.
- property n_ancestries¶
Retrieve n_ancestries.
- Returns:
int – The total number of unique ancestries.
- property n_haplotypes¶
Retrieve n_haplotypes.
- Returns:
int – The total number of haplotypes.
- property n_windows¶
Retrieve n_windows.
- Returns:
int – The total number of genomic windows.
- property shape¶
Retrieve the shape of the LAI matrix.
- Returns:
tuple – (n_windows, n_haplotypes).
- copy()[source]¶
Create and return a copy of self.
- Returns:
LocalAncestryObject – A new instance of the current object.
- keys()[source]¶
Retrieve a list of public attribute names for self.
- Returns:
list of str – A list of attribute names, with internal name-mangling removed, for easier reference to public attributes in the instance.
- filter_windows(indexes, include=True, inplace=False)[source]¶
Filter genomic windows based on specified indexes.
This method updates the lai attribute to include or exclude the specified genomic windows. Attributes such as window_sizes, centimorgan_pos, chromosomes, and physical_pos will also be updated accordingly if they are not None. The order of genomic windows is preserved.
Negative indexes are supported and follow [NumPy’s indexing conventions](https://numpy.org/doc/stable/user/basics.indexing.html).
- Parameters:
indexes (int or array-like of int) – Index(es) of the windows to include or exclude. Can be a single integer or a sequence of integers. Negative indexes are supported.
include (bool, default=True) – If True, includes only the specified windows. If False, excludes the specified windows. Default is True.
inplace (bool, default=False) – If True, modifies self in place. If False, returns a new LocalAncestryObject with the windows filtered. Default is False.
- Returns:
Optional[LocalAncestryObject] – A new LocalAncestryObject with the specified windows filtered if inplace=False. If inplace=True, modifies self in place and returns None.
- filter_samples(samples=None, indexes=None, include=True, reorder=False, inplace=False)[source]¶
Filter samples based on specified names or indexes.
This method updates the lai, haplotypes, and samples attributes to include or exclude the specified samples. Each sample is associated with two haplotypes, which are included or excluded together. The order of the samples is preserved. Set reorder=True to match the ordering of the provided samples and/or indexes lists when including.
If both samples and indexes are provided, any sample matching either a name in samples or an index in indexes will be included or excluded.
Negative indexes are supported and follow [NumPy’s indexing conventions](https://numpy.org/doc/stable/user/basics.indexing.html).
- Parameters:
samples (str or array_like of str, optional) – Name(s) of the samples to include or exclude. Can be a single sample name or a sequence of sample names. Default is None.
indexes (int or array_like of int, optional) – Index(es) of the samples to include or exclude. Can be a single index or a sequence of indexes. Negative indexes are supported. Default is None.
include (bool, default=True) – If True, includes only the specified samples. If False, excludes the specified samples. Default is True.
inplace (bool, default=False) – If True, modifies self in place. If False, returns a new LocalAncestryObject with the samples filtered. Default is False.
- Returns:
Optional[LocalAncestryObject] – A new LocalAncestryObject with the specified samples filtered if inplace=False. If inplace=True, modifies self in place and returns None.
- convert_to_snp_level(snpobject=None, variants_chrom=None, variants_pos=None, variants_ref=None, variants_alt=None, variants_filter_pass=None, variants_id=None, variants_qual=None, lai_format='3D')[source]¶
Convert self into a snputils.snp.genobj.SNPObject SNP-level Local Ancestry Information (LAI), with optional support for SNP data.
If SNP positions (variants_pos) and/or chromosomes (variants_chrom) are not specified, the method generates SNPs uniformly across the start and end positions of each genomic window. Otherwise, the provided SNP coordinates are used to assign ancestry values based on their respective windows.
If a SNPObject is provided, its attributes are used unless explicitly overridden by the function arguments. In that case, the SNPObject is updated with the (optional) new attributes and the computed calldata_lai, then returned.
- Parameters:
snpobject (SNPObject, optional) – An existing SNPObject to extract SNP attributes from.
variants_chrom (array of shape (n_snps,), optional) – An array containing the chromosome for each SNP.
variants_pos (array of shape (n_snps,), optional) – An array containing the chromosomal positions for each SNP.
variants_ref (array of shape (n_snps,), optional) – An array containing the reference allele for each SNP.
variants_alt (array of shape (n_snps,), optional) – An array containing the alternate allele for each SNP.
variants_filter_pass (array of shape (n_snps,), optional) – An array indicating whether each SNP passed control checks.
variants_id (array of shape (n_snps,), optional) – An array containing unique identifiers (IDs) for each SNP.
variants_qual (array of shape (n_snps,), optional) – An array containing the Phred-scaled quality score for each SNP.
lai_format (str, optional) –
- Determines the shape of calldata_lai:
”3D” (default): Shape (n_snps, n_samples, 2).
”2D”: Shape (n_snps, n_samples * 2).
- Returns:
SNPObject – A SNPObject containing SNP-level ancestry data and updated SNP attributes.
- save(file)[source]¶
Save the data stored in self to a specified file. If the file already exists, it will be overwritten.
The format of the saved file is determined by the file extension provided in the file argument.
Supported formats:
.msp: Text-based MSP format.
.msp.tsv: Text-based MSP format with TSV extension.
.pkl: Pickle format for saving self in serialized form.
- Parameters:
file (str or pathlib.Path) – Path to the file where the data will be saved. The extension of the file determines the save format. Supported extensions: .msp, .msp.tsv, .pkl.
- save_msp(file)[source]¶
Save the data stored in self to a .msp file. If the file already exists, it will be overwritten.
- Parameters:
file (str or pathlib.Path) – Path to the file where the data will be saved. It should end with .msp or .msp.tsv. If the provided path does not have one of these extensions, the .msp extension will be appended.
- save_pickle(file)[source]¶
Save self in serialized form to a .pkl file. If the file already exists, it will be overwritten.
- Parameters:
file (str or pathlib.Path) – Path to the file where the data will be saved. It should end with .pkl. If the provided path does not have this extension, it will be appended.
- class snputils.GlobalAncestryObject(Q, P=None, samples=None, snps=None, ancestries=None)[source]¶
Bases:
AncestryObjectA class for Global Ancestry Inference (GAI) data.
- Parameters:
Q (array of shape (n_samples, n_ancestries)) – A 2D array containing per-sample ancestry proportions. Each row corresponds to a sample, and each column corresponds to an ancestry.
P (array of shape (n_snps, n_ancestries)) – A 2D array containing per-ancestry SNP frequencies. Each row corresponds to a SNP, and each column corresponds to an ancestry.
samples (sequence of length n_samples, optional) – A sequence containing unique identifiers for each sample. If None, sample identifiers are assigned as integers from 0 to n_samples - 1.
snps (sequence of length n_snps, optional) – A sequence containing identifiers for each SNP. If None, SNPs are assigned as integers from 0 to n_snps - 1.
ancestries (sequence of length n_samples, optional) – A sequence containing ancestry labels for each sample.
- property Q¶
Retrieve Q.
- Returns:
array of shape (n_samples, n_ancestries) – A 2D array containing per-sample ancestry proportions. Each row corresponds to a sample, and each column corresponds to an ancestry.
- property P¶
Retrieve P.
- Returns:
array of shape (n_snps, n_ancestries) – A 2D array containing per-ancestry SNP frequencies. Each row corresponds to a SNP, and each column corresponds to an ancestry.
- property F¶
Alias for P.
- Returns:
array of shape (n_snps, n_ancestries) – A 2D array containing per-ancestry SNP frequencies. Each row corresponds to a SNP, and each column corresponds to an ancestry.
- property samples¶
Retrieve samples.
- Returns:
array of shape (n_samples,) – An array containing unique identifiers for each sample. If None, sample identifiers are assigned as integers from 0 to n_samples - 1.
- property snps¶
Retrieve snps.
- Returns:
array of shape (n_snps,) – An array containing identifiers for each SNP. If None, SNPs are assigned as integers from 0 to n_snps - 1.
- property ancestries¶
Retrieve ancestries.
- Returns:
array of shape (n_samples,) – An array containing ancestry labels for each sample.
- property n_samples¶
Retrieve n_samples.
- Returns:
int – The total number of samples.
- property n_snps¶
Retrieve n_snps.
- Returns:
int – The total number of SNPs.
- property n_ancestries¶
Retrieve n_ancestries.
- Returns:
int – The total number of unique ancestries.
- property shape¶
Retrieve the shape of the primary Q matrix.
- Returns:
tuple – (n_samples, n_ancestries).
- copy()[source]¶
Create and return a copy of self.
- Returns:
GlobalAncestryObject – A new instance of the current object.
- keys()[source]¶
Retrieve a list of public attribute names for self.
- Returns:
list of str – A list of attribute names, with internal name-mangling removed, for easier reference to public attributes in the instance.
- save(file)[source]¶
Save the data stored in self to a specified file or set of files.
The format of the saved file(s) is determined by the file extension provided in the file argument. If the extension is .pkl, the object is serialized as a pickle file. Otherwise, the file is treated as a prefix for saving ADMIXTURE files.
Supported formats:
.pkl: Pickle format for saving self in serialized form.
Any other extension or no extension: Treated as a prefix for ADMIXTURE files.
- Parameters:
file (str or pathlib.Path) – Path to the file where the data will be saved. If the extension is .pkl, the object is serialized. Otherwise, it is treated as a prefix for ADMIXTURE files.
- save_admixture(file_prefix)[source]¶
Save the data stored in self into multiple ADMIXTURE files. If the file already exists, it will be overwritten.
Output files:
<file_prefix>.K.Q: Q matrix file. The file uses space (’ ‘) as the delimiter.
<file_prefix>.K.P: P matrix file. The file uses space (’ ‘) as the delimiter.
<file_prefix>.sample_ids.txt: Sample IDs file (if sample IDs are available).
<file_prefix>.snp_ids.txt: SNP IDs file (if SNP IDs are available).
<file_prefix>.map: Ancestry file (if ancestries information is available).
- Parameters:
file_prefix (str or pathlib.Path) – The base prefix for output file names, including directory path but excluding file extensions. The prefix is used to generate specific file names for each output, with file-specific suffixes appended as described above (e.g., file_prefix.n_ancestries.Q for the Q matrix file).
- save_pickle(file)[source]¶
Save self in serialized form to a .pkl file. If the file already exists, it will be overwritten.
- Parameters:
file (str or pathlib.Path) – Path to the file where the data will be saved. It should end with .pkl. If the provided path does not have this extension, it will be appended.
Readers¶
- class snputils.MSPReader(file)[source]¶
Bases:
LAIBaseReaderA reader class for parsing Local Ancestry Inference (LAI) data from an .msp or msp.tsv file and constructing a snputils.ancestry.genobj.LocalAncestryObject.
- Parameters:
file (str or pathlib.Path) – Path to the file to be read. It should end with .msp or .msp.tsv.
- property file¶
Retrieve file.
- Returns:
pathlib.Path – Path to the file to be read. It should end with .msp or .msp.tsv.
- read()[source]¶
Read data from the provided .msp or msp.tsv file and construct a snputils.ancestry.genobj.LocalAncestryObject.
Expected MSP content:
The .msp file should contain local ancestry assignments for each haplotype across genomic windows. Each row should correspond to a genomic window and include the following columns:
#chm: Chromosome numbers corresponding to each genomic window.
spos: Start physical position for each window.
epos: End physical position for each window.
sgpos: Start centimorgan position for each window.
egpos: End centimorgan position for each window.
n snps: Number of SNPs in each genomic window.
SampleID.0: Local ancestry for the first haplotype of the sample for each window.
SampleID.1: Local ancestry for the second haplotype of the sample for each window.
- Returns:
LocalAncestryObject – A LocalAncestryObject instance.
- class snputils.AdmixtureReader(Q_file, P_file=None, sample_file=None, snp_file=None, ancestry_file=None)[source]¶
Bases:
WideBaseReaderA reader class for parsing ADMIXTURE files and constructing a snputils.ancestry.genobj.GlobalAncestryObject.
- Parameters:
Q_file (str or pathlib.Path) – Path to the file containing the Q matrix (per-sample ancestry proportions). It should end with .Q or .txt. The file should use space (’ ‘) as the delimiter.
P_file (str or pathlib.Path, optional) – Path to the file containing the P/F matrix (per-ancestry SNP frequencies). It should end with .P or .txt. The file should use space (’ ‘) as the delimiter. If None, P is not loaded.
sample_file (str or pathlib.Path, optional) – Path to the single-column file containing sample identifiers. It should end with .fam or .txt. If None, sample identifiers are not loaded.
snp_file (str or pathlib.Path, optional) – Path to the single-column file containing SNP identifiers. It should end with .bim or .txt. If None, SNP identifiers are not loaded.
ancestry_file (str or pathlib.Path, optional) – Path to the single-column file containing ancestry labels for each sample. It should end with .map or .txt. If None, ancestries are not loaded.
- property Q_file¶
Retrieve Q_file.
- Returns:
pathlib.Path – Path to the file containing the Q matrix (per-sample ancestry proportions). It should end with .Q or .txt. The file should use space (’ ‘) as the delimiter.
- property P_file¶
Retrieve P_file.
- Returns:
pathlib.Path or None – Path to the file containing the P/F matrix (per-ancestry SNP frequencies). It should end with .P or .txt. The file should use space (’ ‘) as the delimiter. If None, P is not loaded.
- property sample_file¶
Retrieve sample_file.
- Returns:
pathlib.Path – Path to the single-column file containing sample identifiers. It should end with .fam or .txt. If None, sample identifiers are not loaded.
- property snp_file¶
Retrieve snp_file.
- Returns:
pathlib.Path – Path to the single-column file containing SNP identifiers. It should end with .bim or .txt. If None, SNP identifiers are not loaded.
- property ancestry_file¶
Retrieve ancestry_file.
- Returns:
pathlib.Path – Path to the single-column file containing ancestry labels for each sample. It should end with .map or .txt. If None, ancestries are not loaded.
- read()[source]¶
Read data from the provided ADMIXTURE files and construct a snputils.ancestry.genobj.GlobalAncestryObject instance.
Expected ADMIXTURE files content:
- Q_file:
- A text file containing the Q matrix with per-sample ancestry proportions.
Each row corresponds to a sample, and each column corresponds to an ancestry.
- P_file:
A text file containing the P matrix with per-ancestry SNP frequencies. Each row corresponds to a SNP, and each column corresponds to an ancestry.
Optional files (if provided): - sample_file: A single-column text file containing sample identifiers in order. - snp_file: A single-column text file containing SNP identifiers in order. - ancestry_file: A single-column text file containing ancestry labels for each sample.
- Returns:
GlobalAncestryObject – A GlobalAncestryObject instance.
Read Functions¶
- snputils.read_lai(file, **kwargs)[source]¶
Automatically detect the local ancestry data file format from the file’s extension and read it into a snputils.ancestry.genobj.LocalAncestryObject.
Supported formats:
.msp: Text-based MSP format.
.msp.tsv: Text-based MSP format with TSV extension.
- Parameters:
file (str or pathlib.Path) – Path to the file to be read. It should end with .msp or .msp.tsv.
**kwargs – Additional arguments passed to the reader method.
- snputils.read_msp(file)[source]¶
Read data from an .msp or .msp.tsv file and construct a snputils.ancestry.genobj.LocalAncestryObject.
- Parameters:
file (str or pathlib.Path) – Path to the file to be read. It should end with .msp or .msp.tsv.
- Returns:
LocalAncestryObject – A LocalAncestryObject instance.
- snputils.read_adm(Q_file, P_file=None, sample_file=None, snp_file=None, ancestry_file=None)¶
Read ADMIXTURE files into a snputils.ancestry.genobj.GlobalAncestryObject.
- Parameters:
Q_file (str or pathlib.Path) – Path to the file containing the Q matrix (per-sample ancestry proportions). It should end with .Q or .txt. The file should use space (’ ‘) as the delimiter.
P_file (str or pathlib.Path, optional) – Path to the file containing the P/F matrix (per-ancestry SNP frequencies). It should end with .P or .txt. The file should use space (’ ‘) as the delimiter. If None, P is not loaded.
sample_file (str or pathlib.Path, optional) – Path to the single-column file containing sample identifiers. It should end with .fam or .txt. If None, sample identifiers are not loaded.
snp_file (str or pathlib.Path, optional) – Path to the single-column file containing SNP identifiers. It should end with .bim or .txt. If None, SNP identifiers are not loaded.
ancestry_file (str or pathlib.Path, optional) – Path to the single-column file containing ancestry labels for each sample. It should end with .map or .txt. If None, ancestries are not loaded.
- Returns:
GlobalAncestryObject – A GlobalAncestryObject instance.
- snputils.read_admixture(Q_file, P_file=None, sample_file=None, snp_file=None, ancestry_file=None)[source]¶
Read ADMIXTURE files into a snputils.ancestry.genobj.GlobalAncestryObject.
- Parameters:
Q_file (str or pathlib.Path) – Path to the file containing the Q matrix (per-sample ancestry proportions). It should end with .Q or .txt. The file should use space (’ ‘) as the delimiter.
P_file (str or pathlib.Path, optional) – Path to the file containing the P/F matrix (per-ancestry SNP frequencies). It should end with .P or .txt. The file should use space (’ ‘) as the delimiter. If None, P is not loaded.
sample_file (str or pathlib.Path, optional) – Path to the single-column file containing sample identifiers. It should end with .fam or .txt. If None, sample identifiers are not loaded.
snp_file (str or pathlib.Path, optional) – Path to the single-column file containing SNP identifiers. It should end with .bim or .txt. If None, SNP identifiers are not loaded.
ancestry_file (str or pathlib.Path, optional) – Path to the single-column file containing ancestry labels for each sample. It should end with .map or .txt. If None, ancestries are not loaded.
- Returns:
GlobalAncestryObject – A GlobalAncestryObject instance.
Writers¶
- class snputils.MSPWriter(laiobj, file)[source]¶
Bases:
LAIBaseWriterA writer class for exporting local ancestry data from a snputils.ancestry.genobj.LocalAncestryObject into an .msp or .msp.tsv file.
- Parameters:
laiobj (LocalAncestryObject) – A LocalAncestryObject instance.
file (str or pathlib.Path) – Path to the file where the data will be saved. It should end with .msp or .msp.tsv. If the provided path does not have one of these extensions, the .msp extension will be appended.
- property laiobj¶
Retrieve laiobj.
- Returns:
LocalAncestryObject – A LocalAncestryObject instance.
- property file¶
Retrieve file.
- Returns:
pathlib.Path – Path to the file where the data will be saved. It should end with .msp or .msp.tsv. If the provided path does not have one of these extensions, the .msp extension will be appended.
- write()[source]¶
Write the data contained in the laiobj instance to the specified output file. If the file already exists, it will be overwritten.
Output MSP content:
The output .msp file will contain local ancestry assignments for each haplotype across genomic windows. Each row corresponds to a genomic window and includes the following columns:
#chm: Chromosome numbers corresponding to each genomic window.
spos: Start physical position for each window.
epos: End physical position for each window.
sgpos: Start centimorgan position for each window.
egpos: End centimorgan position for each window.
n snps: Number of SNPs in each genomic window.
SampleID.0: Local ancestry for the first haplotype of the sample for each window.
SampleID.1: Local ancestry for the second haplotype of the sample for each window.
- class snputils.AdmixtureWriter(wideobj, file_prefix)[source]¶
Bases:
WideBaseWriterA writer class for exporting global ancestry data from a snputils.ancestry.genobj.GlobalAncestryObject into multiple ADMIXTURE files.
- Parameters:
wideobj (GlobalAncestryObject) – A GlobalAncestryObject instance.
file_prefix (str or pathlib.Path) – Prefix for output file names, including directory path but excluding file extensions. The prefix is used to generate specific file names for each output, with file-specific suffixes appended as described above (e.g., file_prefix.n_ancestries.Q for the Q matrix file).
- property wideobj¶
Retrieve wideobj.
- Returns:
GlobalAncestryObject – A GlobalAncestryObject instance.
- property file_prefix¶
Retrieve file_prefix.
- Returns:
pathlib.Path – Prefix for output file names, including directory path but excluding file extensions. The prefix is used to generate specific file names for each output, with file-specific suffixes appended as described above (e.g., file_prefix.n_ancestries.Q for the Q matrix file).
- property Q_file¶
Retrieve Q_file.
- Returns:
pathlib.Path – Path to the .Q file that will store the Q matrix (per-sample ancestry proportions).
- property P_file¶
Retrieve P_file.
- Returns:
pathlib.Path – Path to the .P file that will store the P/F matrix (per-ancestry SNP frequencies).
- property sample_file¶
Retrieve sample_file.
- Returns:
pathlib.Path – Path to the .txt the file that will store sample identifiers. If None, sample identifiers are not saved.
- property snp_file¶
Retrieve snp_file.
- Returns:
pathlib.Path – Path to the .txt file that will store SNP identifiers. If None, SNP identifiers are not saved.
- property ancestry_file¶
Retrieve ancestry_file.
- Returns:
pathlib.Path – Path to the .map file that will store ancestry labels for each sample. If None, ancestries are not saved.
- write()[source]¶
Write the data contained in the wideobj instance into the multiple ADMIXTURE files with the specified file_prefix. If the files already exist, they will be overwritten.
Output files:
<file_prefix>.K.Q: Q matrix file. The file uses space (’ ‘) as the delimiter.
<file_prefix>.K.P: P matrix file. The file uses space (’ ‘) as the delimiter.
<file_prefix>.sample_ids.txt: Sample IDs file (if sample IDs are available).
<file_prefix>.snp_ids.txt: SNP IDs file (if SNP IDs are available).
<file_prefix>.map: Ancestry file (if ancestries information is available).
where K is the total number of ancestries.
- class snputils.AdmixtureMappingVCFWriter(laiobj, file, ancestry_map=None)[source]¶
Bases:
objectA writer class for converting and writing local ancestry data into ancestry-specific VCF/BCF files for ADMIXTURE mapping.
- Parameters:
laiobj (LocalAncestryObject) – A LocalAncestryObject instance.
file (str or pathlib.Path) – Path to the file where the data will be saved. It should end with .vcf or .bcf. If the provided path does not have one of these extensions, the .vcf extension will be appended.
ancestry_map (dict of str to str, optional) – A dictionary mapping ancestry codes to region names. If not explicitly provided, it will default to the ancestry_map from laiobj.
- property laiobj¶
Retrieve laiobj.
- Returns:
LocalAncestryObject – A LocalAncestryObject instance.
- property file¶
Retrieve file.
- Returns:
pathlib.Path – Path to the file where the data will be saved. It should end with .vcf or .bcf. If the provided path does not have one of these extensions, the .vcf extension will be appended.
- property ancestry_map¶
Retrieve ancestry_map.
- Returns:
dict of str to str – A dictionary mapping ancestry codes to region names. If not explicitly provided, it will default to the ancestry_map from laiobj.
- write()[source]¶
Write VCF or BCF files for each ancestry type defined in the ancestry map. If the file already exists, it will be overwritten.
Output VCF/BCF content:
For each ancestry, this method converts LAI data to SNP alleles and writes it in a VCF-compatible format. SNPs are encoded as follows:
1: Indicates positions that match the specified ancestry.
0: Indicates positions that do not match the specified ancestry.
The VCF/BCF files will contain the following fields:
CHROM: Chromosome for each variant.
POS: Chromosomal positions for each variant.
ID: Unique identifier for each variant.
REF: Reference allele for each variant.
ALT: Alternate allele for each variant.
QUAL: Phred-scaled quality score for each variant.
FILTER: Status indicating whether each SNP passed control checks.
INFO: When physical positions are available, contains END=<end_pos> for the segment end; otherwise ‘.’.
FORMAT: Genotype format. Set to ‘GT’, representing the genotype as phased alleles.
<SampleID>: One column per sample, containing the genotype data (1|0, 0|1, etc.).
Output files:
A separate VCF file is written for each ancestry type, with filenames formatted as: <filename>_<ancestry>.vcf (e.g., output_African.vcf).