Basis for machine learning systems decisions

But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”

“It’s not only the medical domain,” adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei’s thesis advisor. “It’s in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it.”

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”

Virtual brains

Neural networks are so called because they mimic — approximately — the structure of the brain. They are composed of a large number of processing nodes that, like individual neurons, are capable of only very simple computations but are connected to each other in dense networks.

In a process referred to as “deep learning,” training data is fed to a network’s input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network’s output nodes are then correlated with the classification category that the network is trying to learn — such as the objects in an image, or the topic of an essay.

Over the course of the network’s training, the operations performed by the individual nodes are continuously modified to yield consistently good results across the whole set of training examples. By the end of the process, the computer scientists who programmed the network often have no idea what the nodes’ settings are. Even if they do, it can be very hard to translate that low-level information back into an intelligible description of the system’s decision-making process.

In the new paper, Lei, Barzilay, and Jaakkola specifically address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the CSAIL researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.