Principal component analysis in Bayes spaces for sparsely sampled density functions
This paper presents a novel approach to functional principal component analysis (FPCA) in Bayes spaces in the setting where densities are the object of analysis, but only few individual samples from each density are observed. We use the observed data directly to account for all sources of uncertainty, instead of relying on prior estimation of the underlying densities in a two-step approach, which can be inaccurate if small or heterogeneous numbers of samples per density are available. To account for the constrained nature of densities, we base our approach on Bayes spaces, which extend the Aitchison geometry for compositional data to density functions. For modeling, we exploit the isometric isomorphism between the Bayes space and the 𝕃^2 subspace 𝕃_0^2 with integration-to-zero constraint through the centered log-ratio transformation. As only discrete draws from each density are observed, we treat the underlying functional densities as latent variables within a maximum likelihood framework and employ a Monte Carlo Expectation Maximization (MCEM) algorithm for model estimation. Resulting estimates are useful for exploratory analyses of density data, for dimension reduction in subsequent analyses, as well as for improved preprocessing of sparsely sampled density data compared to existing methods. The proposed method is applied to analyze the distribution of maximum daily temperatures in Berlin during the summer months for the last 70 years, as well as the distribution of rental prices in the districts of Munich.
READ FULL TEXT