Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition
Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning by employing visual and linguistic representations of words sourced from large-scale pre-trained artificial neural networks. Examining the structure of both visual and linguistic embedding spaces, we find, first, that the representation of verbs is generally more variable and less discriminable within domain than the representation of nouns. Second, we find that if only one learning instance per category is available, visual and linguistic representations are less well aligned in the verb system than in the noun system. However, in parallel with the course of human language development, if multiple learning instances per category are available, visual and linguistic representations become almost as well aligned in the verb system as in the noun system. Third, we compare the relative contributions of factors that may predict learning difficulty for individual words. A regression analysis reveals that visual variability is the strongest factor that internally drives verb learning, followed by visual-linguistic alignment and linguistic variability. Based on these results, we conclude that verb acquisition is influenced by all three sources of complexity, but that the variability of visual structure poses the most significant challenge for verb learning.
READ FULL TEXT