Appendix D | Bayesian Robust Linear Model with Mahalanobis (BRLMM) Distance Classifier Al-
in the case of the SNP on the left, which is quite shifted from the ideal
heterozygotes contrast value of zero.
To start the process, you must seed with some initial genotype
estimates from which to build the generic prior. There is an excellent
candidate in the existing DM approach, which is used with a highly-
stringent confidence threshold of 0.17 to determine initial genotype
calls. Note that in this use of DM calls for a starting point, there is still
an indirect reliance on the MM probes. However, it is demonstrated
that it is possible to get sufficiently good initial estimates without
requiring MM probes. Therefore, it is feasible to make new chip
designs with at least half the number of probes. With these initial calls
in hand, a random sample of 10,000 SNPs is scanned to identify SNPs
that have at least two initial DM calls each (the minimum requirement
to have a variance estimate for each genotype). Note that this creates a
requirement that an absolute minimum of six samples be run together;
although in practice, it is generally better to have more (discussed in
more detail in the Discussion section below). The use of a random
sample of SNPs allows for faster and more memory efficient processing
– only a small subset of the probe intensities needs to be loaded and
analyzed. The random sampling is formally a simple, random sample
from all SNPs on the chip. The sampling is implemented in a
deterministic fashion, so that reanalyzing the same data at a different
time or on a different operating system yields the same results. The
result of this step is typically ~5,000 SNPs (depending on sample size
and genetic diversity), which are then used to derive the generic SNP
prior.
Having estimated the generic prior, the next step is to take each SNP
and combine the prior with whatever DM initial estimates may be
available for that particular SNP and come up with a posterior
estimate for cluster centers and variances. To set up some notation, the
following SNP-specific quantities are given:
• Observed Data For The Given SNP:
v = The 6-dimensional vector of the cluster center coordinates,
estimated as the average transformed intensity value within each
genotype. Some or all of these entries may be null, if there are no
DM initial estimates of one or more of the three genotypes.
Need help?
Do you have a question about the iServer MicroServer iTHX-M and is the answer not in the manual?
Questions and answers