Understanding the World Through Code

Funded through the NSF Expeditions in Computing Program


You can subscribe to our public mailing list to receive announcements about upcoming events.
07/2022: Neurosym webinar series Iddo Drori, Associate Professor of Computer Science, joining the faculty of Boston University in CS, an adjunct at Columbia University in CS

Learning to learn courses

We present datasets for learning to solve university-level mathematics course problem sets (mathQ), MIT machine learning final exams (mlfinalsQ), university-level STEM course problem sets (stemQ), undergo the MIT and Harvard admissions process and pass the EECS and SEAS undergraduate curricula (mitharvardQ). We present methods based on new large-scale language models, program synthesis, and few-shot learning, that perform at a human level. We're now working on solving questions that rely on images and require mathematical proofs and on building graphs of questions and courses. This is joint work with colleagues and students at MIT, Harvard University, Cornell University, Columbia University, and the University of Waterloo, published in the Proceedings of the National Academy of Sciences (PNAS), International Conference on Artificial Intelligence in Education (AIED), and ACML (best student paper). Bio: Iddo Drori is an Associate Professor of Computer Science, joining the faculty of Boston University in CS, an adjunct at Columbia University in CS, and affiliated with MIT. He was a lecturer at MIT in EECS, a visiting Associate Professor at Cornell University in ORIE, and a postdoctoral research fellow at Stanford University in Statistics. He also holds an MBA in Organizational Behavior and Entrepreneurship and has a decade of industry research and leadership experience. He is the author of a new textbook 'The Science of Deep Learning' published by Cambridge University Press. His main research is in machine learning, AI, and computer vision, with over 70 publications with over 5,000 citations. He has won multiple competitions in computer vision conferences and received multiple best paper awards in machine learning.
When: Tuesday July 27, 2022 04:00 PM Eastern Time (US and Canada)
Watch: Video coming soon.
05/2022: Neurosym webinar series Jeevana Inala, senior researcher in the Deep Learning team at Microsoft Research

From Probabilistic Logics to Neuro-Symbolic Artificial Intelligence

Large language models (LLMs) have shown impressive ability in writing code. In this talk, I will present some of our works on further improving their programming ability using neurosymbolic techniques. In particular, I will talk about how we can use program execution to (i) augment the training by learning from semantically equivalent programs: a key insight here is that learning can also benefit from partially correct programs that we can identify using program tracing and (ii) filter the generated code at inference time using a fault-aware neural code ranker model: a key insight here is that a model trained to predict how a code fails (rather than just predicting if a code fails) can have a better understanding of the code and the task. We use the above methods on multiple LLMs such as Codex, GPT-J and GPT-Neo models and we show better performance on multiple coding/math reasoning datasets. Bio:Jeevana Inala is a senior researcher in the Deep Learning team at Microsoft Research. Previously, she completed her PhD from MIT with Prof. Armando Solar-Lezama. She works on program synthesis, AI for code, and neurosymbolic learning.
When: Tuesday May 31, 2022 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
04/2022: Summer School Registration now Open! The first summer school on Neurosymbolic Programming will take place on July 11-13 2022. Apply here to attend the summer school; the application deadline is May 13 . Limited funds for travel grants are available with priority given to graduate students.
03/2022: Neurosym webinar series Pietro Perona — leader of the Computation and Neural Systems program at the California Institute of Technology — will be talking about emergence of number sense.

A sense for number and quantity as an emergent property of a manipulating agent

The ability to understand and manipulate numbers and quantities emerges during childhood, but the mechanism through which this ability is developed is still poorly understood. In particular, it is not known whether acquiring such a number sense is possible without supervision from a teacher.

To explore this question, we propose a model in which spontaneous and undirected manipulation of small objects trains perception to predict the resulting scene changes. We find that, from this task, a representation emerges that supports understanding numbers and quantity. Emergent properties include distinct categories for zero and the first few natural numbers, a notion of order, and a signal that correlates with numerical quantity. As a result, our model acquires the ability to estimate the number of objects in the scene, as well as subitization, i.e. the ability to recognize at a glance the exact number of objects in small scenes. We conclude that important aspects of a facility with numbers and quantities may be learned without explicit teacher supervision.

Joint work with Neehar Kondapaneni

Bio: Pietro Perona received a Ph.D. in electrical engineering and computer science from the University of California, Berkeley, in 1990. In 1990, he was postdoctoral fellow at the International Computer Science Institute at Berkeley. From 1990 to 1991, he was a postdoctoral fellow at the Massachusetts Institute of Technology in the Laboratory for Information and Decision Systems. In the fall of 1991, Perona joined the California Institute of Technology as assistant professor. He became full professor in 1996 and the Allen E. Puckett Professor of Electrical Engineering and Computation and Neural Systems in 2006. From 1999 to 2005, Perona was the director of the National Science Foundation Center for Neuromorphic Systems Engineering. Since 2005, he has led the Computation and Neural Systems program at the California Institute of Technology.

Perona’s research focuses on the computational aspects of vision and learning. He is known for the anisotropic diffusion equation, a partial differential equation that filters image noise while enhancing region boundaries. He is currently interested in visual recognition and in visual analysis of behavior. In the early 2000s, Perona pioneered the study of visual categorization. Currently, in collaboration with colleagues Michael Dickinson and David Anderson, he applies machine vision to measuring and analyzing the behavior of laboratory animals.

Perona is the recipient of the 2013 Longuet-Higgins Prize and of the 2010 Koenderink Prize for fundamental contributions in computer vision. He is the recipient of the 2003 Institute of Electrical and Electronics Engineers–Conference on Computer Vision and Pattern Recognition best paper award. He is also the recipient of a 1996 NSF Presidential Young Investigator Award.

When: Tuesday March 22, 2022 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
02/2022: Neurosym webinar series Luc De Raedt — professor at the Department of Computer Science, KU Leuven, and director of Leuven.AI — will be talking about probabilistic logics for neurosymbolic AI.

From Probabilistic Logics to Neuro-Symbolic Artificial Intelligence

A central challenge to contemporary AI is to integrate learning and reasoning. The integration of learning and reasoning has been studied for decades already in the fields of statistical relational artificial intelligence and probabilistic programming. StarAI has focussed on unifying logic and probability, the two key frameworks for reasoning, and has extended this probabilistic logics machine learning principles. I will argue that StarAI and Probabilistic Logics form an ideal basis for developing neuro-symbolic artificial intelligence techniques. Thus neuro-symbolic computation = StarAI + Neural Networks. Many parallels will be drawn between these two fields and will be illustrated using the Deep Probabilistic Logic Programming language DeepProbLog.

Bio:Luc De Raedt is full professor at the Department of Computer Science, KU Leuven, and director of Leuven.AI, the newly founded KU Leuven Institute for AI. He is a guestprofessor at Örebro University in the Wallenberg AI, Autonomous Systems and Software Program. He received his PhD in Computer Science from KU Leuven (1991), and was full professor (C4) and Chair of Machine Learning at the Albert-Ludwigs-University Freiburg, Germany (1999-2006). His research interests are in Artificial Intelligence, Machine Learning and Data Mining, as well as their applications. He is well known for his contributions in the areas of learning and reasoning, in particular, for his work on probabilistic and inductive programming. He co-chaired important conferences such as ECMLPKDD 2001 and ICML 2005 (the European and International Conferences on Machine Learning), ECAI 2012 and will chair IJCAI in 2022 (the European and international AI conferences). He is on the editorial board of Artificial Intelligence, Machine Learning and the Journal of Machine Learning Research. He is a EurAI and AAAI fellow, an IJCAI Trustee and received and ERC Advanced Grant in 2015.

When: Tuesday February 22, 2022 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
01/2022: Neurosym webinar series Yu Feng — Assistant Professor at UC Santa Barbara — will be talking about machine learning smart contract verification.

Hardening Solidity Smart Contracts with Refinement Types, for Free

As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. We present SolType, a refinement type system for Solidity that can be used to prevent arithmetic over- and under-flows in smart contracts. SolType allows developers to add refinement type annotations and uses them to prove the safety of arithmetic operations. Specifically, SolType incorporates a rich vocabulary of refinement terms that allow expressing relationships between integer values and aggregate properties of complex data structures. To reduce manual annotations, SolType also incorporates a type inference engine using CHC solvers and can automatically infer non-trivial contract invariants.

Even with the type inference engine, there are still many cases in which we can not infer the invariants, which is a fundamental problem in smart contract verification. Inspired by how human experts construct contract invariants, we propose Venti, the first learning framework for contract invariants. We show how to formulate learning contract invariants as a Markov decision process (MDP) and use reinforcement learning to synthesize the invariants for the underlying MDP. By training with reinforcement learning, Venti captures rich program features and avoids the need for ground truth solutions as supervision. Compared to previous learning tasks for invariant generation, it addresses unique challenges, such as a compact graph representation that dramatically reduces the size of the smart contract while preserving its core semantics.

Bio: Yu Feng is an assistant professor at UC Santa Barbara and his research areas include program analysis, verification, and synthesis. He also conducts research at the intersection of security, data science, visualization, and formal methods. He is the recipient of several best paper awards at PLDI'18, ASE'20, and CHI'21, as well as a Google Faculty research award in 2021.

When: Tuesday January 25, 2022 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
10/2021: Neurosym webinar series Xinyun Chen — Ph.D. candidate at UC Berkeley — will be talking about neural program synthesis.

Neural Program Synthesis for Language Understanding in the Wild

Deep neural networks have achieved remarkable success in natural language processing, especially with the advancement of pre-training techniques. Moreover, recent works show that by training on a large-scale code corpus, sometimes these language models could even generate moderately complicated code from text descriptions, including Tabnine and Codex. In this talk, I will discuss my research on deep learning for program synthesis with two central goals: (1) developing program synthesizers that learn to infer the user intents for real-world deployment; and (2) improving the reasoning and generalization capabilities of existing language models via symbolic representations.

First, I will discuss my SpreadsheetCoder work, where we aim to predict spreadsheet formulas only from the user-written tabular data, without the requirements of any explicit specifications. The SpreadsheetCoder model was recently integrated into Google Sheets, and could potentially benefit hundreds of millions of users. In the second part of my talk, I will go beyond program synthesis applications, and discuss my work on neural-symbolic techniques for language understanding. Despite the tremendous achievements of pre-trained language models, large-scale training does not automatically result in the capability of complex reasoning beyond text pattern matching. By integrating a symbolic reasoning module that synthesizes and executes programs for the task of interest, our neural-symbolic models demonstrate superior compositional reasoning ability, including numerical reasoning and compositional generalization.

Bio: Xinyun Chen is a Ph.D. candidate at UC Berkeley, working with Prof. Dawn Song. Her research lies at the intersection of deep learning, programming languages, and security. Her recent research focuses on neural program synthesis and adversarial machine learning. She received the Facebook Fellowship in 2020, and was selected for Rising Stars in EECS in 2020 and 2021.

When: Tuesday October 26, 2021 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
9/2021: Neurosym webinar series Petar Veličković — Senior Research Scientist at DeepMind — will be talking about neural algorithmic reasoning.

Neuralising a Computer Scientist: The Story So Far

Neural networks that are able to reliably execute algorithmic computation may hold transformative potential to both machine learning and theoretical computer science. On one hand, they could enable the kind of extrapolative generalisation scarcely seen with deep learning models. On another, they may allow for running classical algorithms on inputs previously considered inaccessible to them.

Both of these promises are shepherded by the neural algorithmic reasoning blueprint, which I have recently proposed in a position paper alongside Charles Blundell. On paper, this is a remarkably elegant pipeline for reasoning on natural inputs which carefully leverages the tried-and-tested power of deep neural networks as feature extractors. In practice, how far did we actually take it?

In this talk, I will present three concrete steps we've recently taken towards viably deploying the blueprint at scale:
A dataset of algorithmic reasoning tasks, to be used as a bootstrapping basis;
Using algorithmic reasoners to positively modulate self-supervised representations;
Data-efficient implicit planning using algorithmic reasoners.

along with some thoughts on where we could go next.

Bio: Petar Veličković is a Senior Research Scientist at DeepMind. He holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic he's co-written a proto-book about). Within this area, Petar focuses on graph representation learning and its applications in algorithmic reasoning and computational biology. He has published relevant research in these areas at both machine learning venues (NeurIPS, ICLR, ICML-W) and biomedical venues and journals (Bioinformatics, PLOS One, JCB, PervasiveHealth). In particular, he is the first author of Graph Attention Networks—a popular convolutional layer for graphs—and Deep Graph Infomax—a scalable local/global unsupervised learning pipeline for graphs (featured in ZDNet). Further, his research has been used in substantially improving the travel-time predictions in Google Maps (covered by outlets including CNBC, Endgadget, VentureBeat, CNET, the Verge and ZDNet).

When: Tuesday Sept 28, 2021 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
09/2021: Our second annual meeting will be held September 13-14 2021 at CSAIL Stata center. See the schedule.
7/2021: Neurosym webinar series Hima Lakkaraju — Assistant Professor at Harvard University — will be talking about interpretable machine learning.

Towards Reliable and Robust Model Explanations

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will present some of our recent research that sheds light on the vulnerabilities of popular post hoc explanation techniques such as LIME and SHAP, and also introduce novel methods to address some of these vulnerabilities. More specifically, I will first demonstrate that these methods are brittle, unstable, and are vulnerable to a variety of adversarial attacks. Then, I will discuss two solutions to address some of the aforementioned vulnerabilities–(i) a Bayesian framework that captures the uncertainty associated with post hoc explanations and in turn allows us to generate explanations with user specified levels of confidence, and (ii) a framework based on adversarial training that is designed to make post hoc explanations more stable and robust to shifts in the underlying data; I will conclude the talk by discussing our recent theoretical results which shed light on the equivalence and robustness of state-of-the-art explanation methods.

Bio: Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit: https://himalakkaraju.github.io/

When: Tuesday Jul 27, 2021 04:00 PM Eastern Time (US and Canada)
Watch: Recorded Talk
4/2021: Neurosym webinar series Percy Liang — Associate Professor of Computer Science at Stanford University — will be talking about machine learning for program repair.

Learning to Fix Programs

A huge amount of time is spent by programmers fixing broken code. Our goal is to train neural models that can do this automatically. I will first present DrRepair, a system that learns to edit programs based on error messages. We leverage a large number of valid programs by artificially perturbing (and thus breaking) them. DrRepair obtains strong results on two tasks: fixing errors made by students and pseudocode-to-code translation. We then present a new framework, Break-It-Fix-It (BIFI), which additionally leverages unlabeled broken code to learn a model that perturbs code to generate more realistic broken code. We show that this results in further improvements over DrRepair. Taken together, our work suggests that one can learn a lot just from unlabeled programs and a compiler and no further manual annotations.

Bio: Percy Liang is an Associate Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans many topics in machine learning and natural language processing, including robustness, interpretability, semantics, and reasoning. He is also a strong proponent of reproducibility through the creation of CodaLab Worksheets. His awards include the Presidential Early Career Award for Scientists and Engineers (2019), IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), a Microsoft Research Faculty Fellowship (2014), and multiple paper awards at ACL, EMNLP, ICML, and COLT.

When: Tuesday, April 27 2021, 4-5pm EST
Watch: Recorded Talk
3/2021: Neurosym webinar series Jacob Andreas — X Consortium Assistant Professor at MIT in EECS and CSAIL — will be talking about symbolic representation and reasoning in DNNs.

Implicit Symbolic Representation and Reasoning in Deep Neural Networks

Standard neural network architectures can *in principle* implement symbol processing operations like logical deduction and simulation of complex automata. But do current neural models, trained on standard tasks like image recognition and language understanding, learn to perform symbol manipulation *in practice*? I'll survey two recent findings about implicit symbolic behavior in deep networks. First, I will describe a procedure for automatically labeling neurons with compositional logical descriptions of their behavior. These descriptions surface interpretable learned abstractions in models for vision and language, reveal implicit logical "definitions" of visual and linguistic categories, and enable the design of simple adversarial attacks that exploit errors in definitions. Second, I'll describe ongoing work showing that neural models for language generation perform implicit simulation of entities and relations described by text. Representations in these language models can be (linearly) translated into logical representations of world state, and can be directly edited to produce predictable changes in generated output. Together, these results suggest that highly structured representations and behaviors can emerge even in relatively unstructured models trained on natural tasks. Symbolic models of computation can play a key role in helping us understand these models.

Bio: Jacob Andreas is the X Consortium Assistant Professor at MIT in EECS and CSAIL. He did his PhD work at Berkeley, where he was a member of the Berkeley NLP Group and the Berkeley AI Research Lab. He has also spent time with the Cambridge NLIP Group, and the Center for Computational Learning Systems and NLP Group at Columbia.

When: Tuesday, March 23 2021, 4-5pm EST
Watch: Recorded Talk
2/2021: Neurosym webinar series Mayur Naik — Professor of Computer and Information Science at the University of Pennsylvania — will be talking about differentiable reasoning.

Scallop: End-to-end Differentiable Reasoning at Scale

Approaches to systematically combine symbolic reasoning with deep learning have demonstrated remarkable promise in terms of accuracy and generalizability. However, the complexity of exact probabilistic reasoning renders these methods inefficient for real-world, data-intensive machine learning applications. I will present Scallop, a scalable differentiable probabilistic Datalog engine equipped with a top-k approximate inference algorithm. The algorithm significantly reduces the amount of computation needed for inference and learning tasks without affecting their principal outcomes. To evaluate Scallop, we have crafted a challenging dataset, VQAR, comprising 4 million Visual Question Answering (VQA) instances that necessitate reasoning about real-world images with external common-sense knowledge. Scallop not only scales to these instances but also outperforms state-of-the-art neural-based approaches by 12.44%.

Bio: Mayur Naik is a Professor of Computer and Information Science at the University of Pennsylvania. His research spans the area of programming languages, with a current emphasis on developing scalable techniques to reason about programs by combining machine learning and formal methods. He is also interested in foundations and applications of neuro-symbolic approaches that synergistically combine deep learning and symbolic reasoning. He received a Ph.D. in Computer Science from Stanford University in 2008. Previously, he was a researcher at Intel Labs, Berkeley from 2008 to 2011, and an assistant professor in the College of Computing at Georgia Tech from 2011 to 2016.

When: Tuesday, February 23 2021, 4-5pm EST
Watch: Recorded Talk
1/2021: Neurosym webinar series Jiajun Wu — Assistant Professor of Computer Science at Stanford University — will be talking about some of his work on neurosymbolic approaches to computer vision.

Understanding the Visual World Through Code

Much of our visual world is highly regular: objects are often symmetric and have repetitive parts; indoor scenes such as corridors often consist of objects organized in a repetitive layout. How can we infer and represent such regular structures from raw visual data, and later exploit them for better scene recognition, synthesis, and editing? In this talk, I will present our recent work on developing neuro-symbolic methods for scene understanding. Here, symbolic programs and neural nets play complementary roles: symbolic programs are more data-efficient to train and generalize better to new scenarios, as they robustly capture high-level structure; deep nets effectively extract complex, low-level patterns from cluttered visual data. I will demonstrate the power of such hybrid models in three different domains: 2D image editing, 3D shape modeling, and human motion understanding.

Bio: Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science at Massachusetts Institute of Technology. Wu's research has been recognized through the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and fellowships from Facebook, Nvidia, Samsung, and Adobe.

When: Tuesday January 26, 2021, 4-5pm EST
Watch: Recorded Talk
12/2020: Neurosym webinar series Justin Gottschlich — Principal Scientist and the Director & Founder of Machine Programming Research at Intel Labs — will be talking about Machine Programming.

A Glance into Machine Programming @ Intel Labs

As defined by "The Three Pillars of Machine Programming", machine programming (MP) is concerned with the automation of software development. The three pillars partition MP into the following conceptual components: (i) intention, (ii) invention, and (iii) adaptation, with data being a foundational element that is generally necessary for all pillars. While the goal of MP is complete software automation – something that is likely decades away – we believe there are many seminal research opportunities waiting to be explored today across the three pillars.
In this talk, we will provide a glance into the new Pioneering Machine Programming Research effort at Intel Labs and how it has been established around the three pillars across the entire company. We will also discuss Intel Labs’ general charter for MP, as well as a few early research systems that we have built and are using today to improve the quality and rate at which we are developing software (and hardware) in production systems

Bio: Justin Gottschlich is a Principal Scientist and the Director & Founder of Machine Programming Research at Intel Labs. He also has an academic appointment as an Adjunct Assistant Professor at the University of Pennsylvania. Justin is the Principal Investigator of the joint Intel/NSF CAPA research center, which focuses on simplifying the software programmability challenge for heterogeneous hardware. He co-founded the ACM SIGPLAN Machine Programming Symposium (previously Machine Learning and Programming Languages) and currently serves as its Steering Committee Chair. He is currently serving on two technical advisory boards: the 2020 NSF Expeditions “Understanding the World Through Code” and a new MP startup fully funded by Intel, which is currently in stealth.
Justin has a deep desire to build bridges with thought leaders across industry and academia to research disruptive technology as a community. Recently, he has been focused on machine programming, which is principally about automating software development. Justin currently has active collaborations with Amazon, Brown University, Georgia Tech, Google AI, Hebrew University, IBM Research, Microsoft Research, MIT, Penn, Stanford, UC-Berkeley, UCLA, and University of Wisconsin. He received his PhD in Computer Engineering from the University of Colorado-Boulder in 2011. Justin has 30+ peer-reviewed publications, 35+ issued patents, with 100+ patents pending.

When: Tuesday December 1, 4-5PM EST.
Watch: Recorded Talk
10/2020: Neurosym webinar series Abhinav Verma — PhD student at UT Austin — will talk about his recent work on reinforcement learning algorithms.

Programmatic Reinforcement Learning

We study reinforcement learning algorithms that generate policies that can be represented in expressive high-level Domain Specific Languages (DSL). This work aims to simultaneously address four fundamental drawbacks of Deep Reinforcement Learning (Deep-RL), where the policy is represented by a neural network; interpretability, verifiability, reliability and domain awareness. We formalize a new learning paradigm and provide empirical and theoretical evidence to show that we can generate policies in expressive DSLs that do not suffer from the above shortcomings of Deep-RL. To overcome the challenges of policy search in non-differentiable program space, we introduce a meta-algorithm that is based on mirror descent, program synthesis, and imitation learning. This approach leverages neurosymbolic learning, using synthesized symbolic programs to regularize Deep-RL and using the gradients available to Deep-RL to improve the quality of synthesized programs. Overall this approach establishes a synergistic relationship between Deep-RL and program synthesis.

Bio: Abhinav Verma is a PhD student at UT Austin where he is supervised by Swarat Chaudhuri. His research lies at the intersection of machine learning and program synthesis, with a focus on programmatically interpretable learning. He is a recipient of the 2020 JP Morgan AI Research PhD Fellowship.

When: Tuesday October 27, 4-5PM EST.
Watch: Recorded Talk
10/2020: We are having our official kickoff meeting Some of the talks will be streamed online, see the schedule for the recordings.
9/2020: Neurosym webinar series. In the first talk in the series, Kevin Ellis — research scientist at Common Sense Machines, and soon to be faculty member at the Computer Science Department at Cornell — will talk about his recent work on growing domain specific languages.

Growing domain-specific languages alongside neural program synthesizers via wake-sleep program learning

Two challenges in engineering program synthesis systems are: (1) crafting specialized yet expressive domain specific languages, and (2) designing search algorithms that can tractably explore the space of expressions in this domain specific language. We take a step toward the joint learning of domain specific languages, and the search algorithms that perform synthesis in that language. We propose an algorithm which starts with a relatively minimal domain specific language, and then enriches that language by compressing out common syntactic patterns into a library of reusable domain specific code. In tandem, the system trains a neural network to guide search over expressions in the growing language. From a machine learning perspective, this system implements a wake-sleep algorithm similar to the Helmholtz machine. We apply this algorithm to AI and program synthesis problems, with the goal of understanding how domain specific languages and neural program synthesizers can mutually bootstrap one another.

Related paper

Bio: Kevin Ellis is a research scientist at Common Sense Machines, and recently finished a PhD at MIT under Armando Solar-Lezama and Josh Tenenbaum. He works on program synthesis and artificial intelligence. He will be moving to Cornell to start as an assistant professor in the computer science department starting fall 2021.

When: Tuesday September 29, 4-5PM EST.
Watch: Recorded Talk
7/2020: Meet us at Tapia 2020. We will be present at Tapia 2020. If you are attending the (virtual) conference, come talk to us to learn more about the project and opportunities for undergraduate summer research.