Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program

Cognitive and Behavioral science

Human conceptual representations have complex and abstract structure allowing us to deal efficiently with extremely diverse situations. Concepts are often named by single lexical items, providing a convenient window on mental life. For instance, take the concepts of 'tree', 'forest', and 'friend'. These are everyday concepts, yet they are hard to capture as (even deep) classifiers or other standard machine learning approaches. Instead, they are fuzzy, inter-related systems grounded in our knowledge of physical things, social agents, and so on.

Modeling cognition through programming languages.

Functions in a programming language are a good model of human concepts, because they admit complex abstraction and model systems of related meanings. In particular, programs provide a natural means to capture three key aspects of human concepts: compositional abstraction, graded reasoning under uncertainty, and causal relations. Human concept learning can then be viewed as (probabilistic) program induction. Existing demonstrations of this approach have been hampered by lack of efficient and scalable techniques for program induction. We are working to address this by building high-level languages for common sense domains, and using new program learning tools to induce concepts within these domains. The first part of this task amounts to constructing a "standard library" for core knowledge and the second part to concept learning by probabilistic program induction. Below are some of our ongoing efforts as part of this thrust.

Understanding causal reasoning in Autumn.

Humans are often able to learn detailed causal relationships in the environment from very limited interactions with it. For example, children can figure out the causal mechanism underlying a new toy or video game in just a few minutes, a feat woefully out of reach of modern AI systems. As part of this effort, we are studying the construction of causal theories – explanations of which stimuli cause which changes in an environment – using program synthesis. By representing a causal model as a program, we take advantage of two key features that standard ML models do not possess: programs' compact, interpretable form and data-efficient learning algorithms. We have developed a simple domain to explore this hypothesis, consisting of Atari-style, time-varying grid worlds. We express causal mechanisms in these worlds using a functional reactive programming language called Autumn, which we designed to concisely capture these dynamics. Our objective is to synthesize the underlying Autumn program given a short sequence of observed grid frames and user actions. We are working on a full conference submission on this work, but an early version was presented at the Advances in Programming Languages and Neurosymbolic Systems workshop in NeurIPS '21 and the Causal Inference & Machine Learning workshop, also at NeurIPSRia21aiplans-ours. This project is beign led by graduate student Ria Das and former postdoc Zenna Tavares in collaboration with Josh Tenenbaum and Armando Solar-Lezama.

Understanding language morpho-phonology with neurosymbolic methods.

In a paper published in Nature CommunicationsEllis22Linguistics-ours, we explore how a combination of Bayesian inference with program synthesis is able to learn models of language morpho-phonology from relatively small sets of words in a language. The approach builds on representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. You can read more about this work in a recent MIT news article.

Understanding behavior.

Behavior is arguably the most complex phenotype one could analyze in life and cognitive sciences. For instance, what is a concise theory that explains courtship in fruit flies? Such questions are pervasive in the life sciences where fine-grained behavior of model organisms (e.g., fruit flies or macaques) are being collected at an unprecedented scale. The study of dynamic behavior such as courtship is an ideal testbed for our research agenda. On the one hand, scientists want to discover short programs that provide an interpretable explanation of courtship. On the other hand, the raw data is high dimensional in both time and space, and it also contains significant variability in the behavior of interest, thus requiring deep learning to process raw data and instantiate modules within the program.

Task programming

Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data.

To reduce annotation effort, we developed TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call “task programming” sun2021task-ours, which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks.

We evaluated this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an effective way to reduce annotation effort for domain experts.

Neurosymbolic reasoning for mathematical domains.

The goal of this project is to develop models that can perform mathematical reasoning in educational domains, as a foundational tool to both build educational tools and to seek insight into how humans think about and develop mathematics.

Contrastive policy learning

In a first project, we developed a machine learning agent (ConPoLe, published at NeurIPS 2021) poesia2021contrastive-ours which learns to solve problems in several symbolic reasoning domains (e.g. solving simple equations, simplifying fractions, and solving the classical Rubik's Cube puzzle). The reinforcement learning agent progressively learned to apply basic rules (“axioms”) to reach the goal state (“solution”) from an arbitrary initial state (“problem”), by only receiving a binary reward once a problem is solved. As such, this paper was novel in formulating reinforcement learning as contrastive learning as a way to handle the sparse rewards. The agent worked in a general setting that made it easy to specify a range of educational mathematical domains inspired by the Common Core standard. Moreover, we found that the representations that ConPoLe learned for the problems captured the structure of sections from the Khan Academy exercises for equations, even though that curriculum was never present during training.

Since ConPoLe learned to solve problems from low-level mathematical axioms, its solutions tend to be much longer than human solutions, which typically skip over many axiomatic steps. For example, whereas a human can arrive from 2x + 1 = 5 to 2x = 4 in one step, the agent has to apply 5 axioms in succession (subtraction on both sides, associativity, evaluation of 1 - 1, identity of addition, and another evaluation of 5 - 1). This observation brought us to a natural question: how can we have agents that produce succinct and readable solutions for humans? This question led to a paper that we'll be presenting in CogSci 2022 poesia2022left-ours, where we developed a method for simplifying formal mathematical solutions. We used a skill discovery algorithm from the reinforcement learning literature to divide the axiom-level steps of ConPoLe into larger segments of latent high-level "skills". Then, we obtain a simplified solution by only keeping the last step after each skill is applied. The improvement in the readability of the resulting solutions was statistically significant in a human evaluation. Our comparison to other simpler simplification methods led to insights into what humans might consider most relevant in a mathematical solution, which is important for interacting with students in an educational setting.

We're actively working on two complementary fronts that go beyond these efforts. In recent months, in a collaboration across Stanford and MIT, we have been working on learning symbolic abstractions from mathematical solutions. The abstraction techniques employed in methods like DreamCoder EllisWNSMHCST21-ours give us a way to automatically learn mathematical "lemmas" from ConPoLe solutions. Rewriting solutions with learned lemmas can make them shorter and more readable. Furthermore, using the lemmas during learning can enable the agent to solve harder problems, akin to how human mathematics develops from simpler to complex results. In a parallel effort, we're generalizing our learning methods to work on top of a universal language for mathematical reasoning, where we still aim to learn, just like in ConPoLe, in a completely unsupervised fashion.

Interpreting Expert Annotation Differences in Animal Behavior

In this project, we propose a new method that uses program synthesis to generate interpretable models of expert annotations of animal behavior annotation-ours . This is an important problem because in behavioral neuroscience, hand-annotated data is needed to make sense of neural recordings, yet behavior labels can vary among annotators due to factors such as subjective differences, intra-rater variability, and differing levels of expertise. For a given set of behavior annotations, our model learns both the relevant trajectory features and their corresponding temporal filters. We evaluate our method on a dataset from behavioral neuroscience and demonstrate that compared to baseline classifiers, our method is more accurate at capturing behavior annotations. Furthermore, the shape of a program’s temporal filters can be interpreted as how an annotator attends to a given feature over time. This work was presented at the CVPR 2021 CV4Animals workshop.