Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data
We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital Sky Survey (SDSS) data. We use this method in a way that simultaneously takes into account the measurements of spectral lines as well as the continuum shape. Unlike Principal Component Analysis, this method does not assume that the Euclidean distance between galaxy spectra is a good global measure of similarity between all spectra, but instead it only assumes that local difference information between similar spectra is reliable. Moreover, unlike other nonlinear dimensionality methods, this method can be used to characterize very finely both small-scale local as well as large-scale global properties of realistic noisy data. The power of the method is demonstrated on the SDSS Main Galaxy Sample by illustrating that the derived embeddings of spectra carry an unprecedented amount of information. By using a straightforward global or unsupervised variant, we observe that the main features correlate strongly with star formation rate and that they clearly separate active galactic nuclei. Computed parameters of the method can be used to describe line strengths and their interdependencies. By using a locally-biased or semi-supervised variant, we are able to focus on typical variations around specific objects of astronomical interest. We present several examples illustrating that this approach can enable new discoveries in the data as well as a detailed understanding of very fine local structure that would otherwise be overwhelmed by large-scale noise and global trends in the data.
READ FULL TEXT