Efficient and error-tolerant schemes for non-adaptive complex group testing and its application in complex disease genetics
The goal of combinatorial group testing is to efficiently identify up to d defective items in a large population of n items, where d ≪ n. Defective items satisfy certain properties while the remaining items in the population do not. To efficiently identify defective items, a subset of items is pooled and then tested. In this work, we consider complex group testing (CmplxGT) in which a set of defective items consists of subsets of positive items (called positive complexes). CmplxGT is classified into two categories: classical CmplxGT (CCmplxGT) and generalized CmplxGT (GCmplxGT). In CCmplxGT, the outcome of a test on a subset of items is positive if the subset contains at least one positive complex, and negative otherwise. In GCmplxGT, the outcome of a test on a subset of items is positive if the subset has a certain number of items of some positive complex, and negative otherwise. For CCmplxGT, we present a scheme that efficiently identifies all positive complexes in time t ×poly(d, n) in the presence of erroneous outcomes, where t is a predefined parameter. As d ≪ n, this is significantly better than the currently best time of poly(t) × O(n n). Moreover, in specific cases, the number of tests in our proposed scheme is smaller than previous work. For GCmplxGT, we present a scheme that efficiently identifies all positive complexes. These schemes are directly applicable in various areas such as complex disease genetics, molecular biology, and learning a hidden graph.
READ FULL TEXT