Talks and Abstracts
Alyssa Goodman, Robert Wheeler Wilson Professor of Applied Astronomy, Harvard University
"The Future of Data in Science"
Once upon a time, I was an astrophysicist. Then, about fifteen years ago, I started using medical imaging tools on high-dimensional astronomical data sets, and caring more than was normal about sharing and re-using data and software, and making online documents richer, and I became something for which there was no name. Today, I get called a “Data Scientist.” What I really am is a scientist who worries more about the nature and utility of data than some of my peers. I especially worry about how to best “see” data in innovative visualizations, and how to share combine, and repurpose data and presentations of it (including as journal articles) most efficiently and effectively. In this talk, I will present a vision for the future of scholarly communication based on my work helping create “Seamless Astronomy” research environments that include highly interactive “papers” of the future. And, I will muse about how the same new, modular and repurpose-able technologies used in research are changing the future of education as well. For reference, key current examples I will touch upon are: the “Paper of the Future” on Authorea; the glue visualization environment; the WorldWide Telescope Universe Information system; “10 Questions to Ask When Creating a Visualization”; and PredictionX.
Yael Grushka-Cockayne, Visiting Associate Professor of Business Administration, Harvard Business School
"A Better Way to Forecast the Future"
I will present some recent developments concerning the use of probability forecasts and their combination in decision-making. I will also highlight some important challenges influencing the “goodness” of combined probability forecasts such as miscalibration, dependence among forecasters, and selecting an appropriate evaluation measure. The overall vision is that increased exposure to and improved visualizations of probability forecasts will enhance the public’s understanding of probabilities and how they can contribute to better decisions. Through several important applications from the domains of meteorology, economics, and political science, I will illustrate state-of-the-art usage of probability forecasts: how they are combined, evaluated, and communicated to stakeholders. I will expand on a model we developed for generating probability forecasts of passenger flows at Heathrow Airport.
Stefanie Jegelka, Assistant Professor of Electrical Engineering and Computer Science, MIT
"What Can Neural Networks Represent?"
In recent years, neural networks have achieved impressive empirical results on a wide range of tasks. Yet, the theoretical understanding of their properties is still very much in progress. In this talk, I will address the question of what a neural network can represent. Classical results study this question for shallow and very wide networks, whereas recent results take into account deeper networks that are more common in practice. We focus on two types of networks: (1) the ResNet architecture, and (2) neural networks for graphs. For ResNet, we show how narrow the network can be - if it can be deep - to still be able to represent any ("reasonable") function, which reveals a distinction from other architectures. For graph neural networks, we study what graphs they can distinguish, and what properties of the network affect this discriminative power. Our theoretical results are reflected empirically.
Cynthia Rudin, Associate Professor of Computer Science and Electrical and Computer Engineering, Duke University
"Secrecy, Criminal Justice, and Variable Importance"
The US justice system often uses a combination of (biased) human decision makers and complicated black box proprietary algorithms for high stakes decisions that deeply affect individuals. All of this is still happening, despite the fact that for several years, we have known that interpretable machine learning models were just as accurate as any complicated machine learning methods for predicting criminal recidivism. It is much easier to debate the fairness of an interpretable model than a proprietary model. The most popular proprietary model, COMPAS, was accused by the ProPublica group of being racially biased in 2016, but their analysis was flawed and the true story is much more complicated; their analysis relies on a flawed definition of variable importance that was used to identify the race variable as being important.
In this talk, I will start by introducing a very general form of variable importance, called model class reliance. Model class reliance measures how important a variable is to any sufficiently accurate predictive model within a class. I will use this and other data-centered tools to provide our own investigation of whether COMPAS depends on race, and what else it depends on. Through this analysis, we find another problem with using complicated proprietary models, which is that they seem to be often miscomputed. An easy fix to all of this is to use interpretable (transparent) models instead of complicated or proprietary models in criminal justice.