Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program

Neurosymbolic programming for science talk series

The Neurosymbolic Programming for Science talk series explores the intersection of neural networks and program synthesis in scientific research. It showcases advancements in merging deep learning with symbolic reasoning, encourages collaboration among experts, discusses real-world applications, addresses integration challenges, and contemplates the future of neurosymbolic programming in science. The series features insights from leading experts and offers interactive sessions for attendees to engage and collaborate. Please contact Omar Costilla-Reyes for information on how to join the mailing list for the talks.

08/2023 Advanced Reinforcement Learning in industry: Thomas Walsh, Researcher at Sony AI will be presenting a talk on their work in Sophy, an advanced reinforcement learning agent that can outrace the best champions in Gran Turismo, a cutting-edge driving simulator video game.

Sophy - Advanced Reinforcement Learning in Gran Turismo

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits1. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.
When: Friday August 19th 4 PM-5PM ET
Watch: Youtube
08/2023 Bayesian symbolic regression in phyisics: Roger Guimera from SEES lab will discuss the learnability of closed-form mathematical models

Bayesian symbolic regression and the learnability of closed-form mathematical models

Symbolic regression aims to obtain closed-form mathematical models from data. The two main challenges of traditional symbolic regression, and especially of approaches based on genetic algorithms, are the need to balance model complexity and goodness of fit, and the need to explore the vast space of closed-form models rigorously. In this talk, we will discuss a novel Bayesian approach to symbolic regression, which helps to address these challenges. With regards to the complexity-fit balance, the Bayesian approach amounts to choosing (or sampling) models based on their description length. With regards to the exploration of the space of models, we propose a Markov chain Monte Carlo approach with asymptotic guarantees of performance. We will illustrate the approach by showing how it has already helped to shed light on a number of real scientific problems of interest. Finally, we will discuss how observational noise in the data induces a transition between a learnable phase in which the generating model can be discovered, when the noise is low, and an unlearnable phase, when the noise is high, in which no method can possibly discover the model that truly generated the data.
When: Friday August 5th 4 PM-5PM ET
Watch: Youtube
07/2023 Biology and computer science: Jacob Lemieux and Fritz Obermeyer from the Broad Institute will present a state-of-the-art probabilistic model for predicting COVID-19 lineage fitness

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Repeated emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many nonspike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.
When: Friday July 22nd 4 PM-5PM ET
Watch: Youtube
06/2023 Computational cognitive science talk: Mark Ho, Faculty Fellow / Assistant Professor at NYU's Center for Data Science will be presenting a talk on the topic of the construction of mental representations in human planning for discussion.

Construction of mental representations in human planning

One of the most striking features of human intelligence is our capacity to rapidly and flexibly plan. Planning enables us to solve myriad everyday problems---e.g., planning how to complete a list of errands---but planning is also very computationally demanding. How do people plan despite having limited cognitive resources? My talk will cover recent work on the relationship between task representations and efficient planning in human decision-making. In particular, I will discuss a computational account of value-guided construal, which proposes that people form simplified, ad hoc representations of tasks in order to plan and act. By investigating the general computational principles underlying the formation of task construals, this approach provides a new perspective on interactions between meta-cognition, structured representations, and goal-directed behavior.
When: Friday June 24th 4 PM-5PM ET
Watch: Youtube
04/2023 Differentiable Programming: He Zhu, Assistant Professor, Computer Science Department, Rutgers University will be presenting a talk on the topic of differentiable programming and its search of program structures.

Differentiable Programming via Differentiable Search of Program Structures

Deep learning has led to encouraging successes in many challenging tasks. However, a deep neural model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Differentiable programs have recently attracted much interest due to their interpretability, compositionality, and efficiency to leverage differentiable training. However, synthesizing differentiable programs requires optimizing over a combinatorial, non-differentiable, and rapidly exploded space of program structures. Even with good heuristics, program synthesis by enumerating discrete program structures does not scale in general. We propose to encode program structure search as learning the probability distribution of high-quality structures induced by a context-free grammar. In a continuous relaxation of the search space defined by the grammar rules, our algorithm learns the discrete structure of a differentiable program using efficient gradient methods. Experiment results over application domains including classification, reinforcement learning, and recommendation systems demonstrate that our algorithm excels in discovering optimal differentiable programs that are highly interpretable.
When: Friday April 8th 4 PM-5PM ET
Watch: Youtube
03/2023 Code generation for competitive programming: Yujia Li, researcher at DeepMind, will be presenting a talk on his work on Competition-Level Code Generation with AlphaCode for discussion.

Competition-Level Code Generation with AlphaCode

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.
When: Friday March 11th 4 PM-5PM ET
Watch: Youtube