Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program

Cognitive and Behavioral science

Human conceptual representations have complex and abstract structure allowing us to deal efficiently with extremely diverse situations. Concepts are often named by single lexical items, providing a convenient window on mental life. For instance, take the concepts of 'tree', 'forest', and 'friend'. These are everyday concepts, yet they are hard to capture as (even deep) classifiers or other standard machine learning approaches. Instead, they are fuzzy, inter-related systems grounded in our knowledge of physical things, social agents, and so on.

Modeling meaning through programming languages.

Functions in a programming language are a good model of human concepts, because they admit complex abstraction and model systems of related meanings. In particular, programs provide a natural means to capture three key aspects of human concepts: compositional abstraction, graded reasoning under uncertainty, and causal relations. Human concept learning can then be viewed as (probabilistic) program induction. Existing demonstrations of this approach have been hampered by lack of efficient and scalable techniques for program induction. We propose that this can be addressed by building high-level languages for common sense domains, and using new program learning tools to induce concepts within these domains. The first part of this task amounts to constructing a "standard library" for core knowledge and the second part to concept learning by probabilistic program induction.

Understanding behavior.

Behavior is arguably the most complex phenotype one could analyze in life and cognitive sciences. For instance, what is a concise theory that explains courtship in fruit flies? Such questions are pervasive in the life sciences where fine-grained behavior of model organisms (e.g., fruit flies or macaques) are being collected at an unprecedented scale. The study of dynamic behavior such as courtship is an ideal testbed for our research agenda. On the one hand, scientists want to discover short programs that provide an interpretable explanation of courtship. On the other hand, the raw data is high dimensional in both time and space, and it also contains significant variability in the behavior of interest, thus requiring deep learning to process raw data and instantiate modules within the program.