When Pyotr Stolyarsky died in 1944, he was considered Russia’ s greatest violin teacher. He counted among his pupils a coterie of stars, including David Oistrakh and Nathan Milstein, and a school for gifted musicians in his native Odessa was named after him in 1933. But Stolyarsky couldn’t play the violin anywhere near as well as his best students. What he could do was whisper metaphors into their ears. He might lean over and explain how his mother cooked Sabbath dinner. His advice gave no specific information on what angle the bow should describe, or how to move the fingers across the frets to create vibrato. Instead, it distilled his experience of the music into metaphors his students could understand.
When Vladimir Vapnik teaches his computers to recognize handwriting, he does something similar. While there’s no whispering involved, Vapnik does harness the power of “privileged information.” Passed from student to teacher, parent to child, or colleague to colleague, privileged information encodes knowledge derived from experience. That is what Vapnik was after when he asked Natalia Pavlovich, a professor of Russian poetry, to write poems describing the numbers 5 and 8, for consumption by his learning algorithms. The result sounded like nothing any programmer would write. One of her poems on the number 5 read,
He is running. He is flying. He is looking ahead. He is swift. He is throwing a spear ahead. He is dangerous. It is slanted to the right. Good snaked-ness. The snake is attacking. It is going to jump and bite. It is free and absolutely open to anything. It shows itself, no kidding.
All told, Pavlovich wrote 100 such poems, each on a different example of a handwritten 5 or 8, as shown in the figure to the right. Some had excellent penmanship, others were squiggles. One 5 was, “a regular nice creature. Strong, optimistic and good,” while another seemed “ready to rush forward and attack somebody.” Pavlovich then graded each of the 5s and 8s on 21 different attributes derived from her poems. For example, one handwritten example could have an ‘‘aggressiveness” rating of 2 out of 2, while another could show “stability” to a strength of 2 out of 3.
So instructed, Vapnik’s computer was able to recognize handwritten numbers with far less training than is conventionally required. A learning process that might have required 100,000 samples might now require only 300. The speedup was also independent of the style of the poetry used. When Pavlovich wrote a second set of poems based on Ying-Yang opposites, it worked about equally well. Vapnik is not even certain the teacher has to be right—though consistency seems to count.
Vapnik is one of a growing body of artificial intelligence (AI) researchers discovering something that teachers have long known—or at least, believed—to be true: There is a special, valuable communication that occurs between teacher and student, which goes beyond what can be found in any textbook or raw data stream. By bringing the tools of computation and machine intuition to the table, AI researchers are giving us a more complete picture of how we learn. They are also broadening the study of education to include quantitative, numerical models of the learning process itself. “The thing that AI brings to the table is that it forces us to get into the details of how everything works,” says John Laird, a computer scientist at the University of Michigan. If there was any doubt that good teachers are important, machine learning is helping put it to rest.
There is a special, valuable communication that occurs between teacher and student, which goes beyond what can be found in any textbook or raw data stream.
The teacher-student code has its roots in the sheer complexity of the real world, a complexity that has long bedeviled AI research. Is that flat surface a table? A chair? The floor? What if it’s partly in shadow, or partly obscured? After years searching for simple ways to answer these questions, the AI community is finding that the complexity of the real world is, in some ways, irreducible.
Take, for example, the problem of predicting the distribution of trees in a forest using only map data, such as elevation, slope, sunlight, and shade. Jock Blackard and Denis Dean did the original calculations in 1999, and left behind an enormous public database for other mathematicians to use. According to Vapnik, when computers were trained using 15,000 examples, their predictions were 85 to 87 percent accurate. Pretty good. But when they were fed more than 500,000 examples, they developed more complex rules for tree distribution and boosted the accuracy of their predictions to more than 98 percent.
“This means that a good decision rule is not a simple one, it cannot be described by a very few parameters,” Vapnik said. In fact, he argues that using many weak predictors will always be more accurate than using a few strong ones.1 One approach to capturing complexity is to feed hundreds of thousands, or even millions, of points to a computer, which is called brute force learning. It works well enough, and is the driving engine behind most Big Data commercial enterprises, in which machines are set loose on terabytes of data in order to understand everything from scientific problems to consumer behavior. In fact, Vapnik developed one of the key technologies used by Big Data, called Support Vector Machines. But brute force methods are also slow, inefficient, and useless when data is not plentiful, such as when studying biopsy images for cancer.
Vapnik describes privileged information as a second kind of language with which to instruct computers. Where the language of brute force learning consists of technical measurements, such as shapes, colors, forces, and the amount you spent on groceries, privileged information relies on metaphor. And metaphor can make the difference between smart science and brute force science.
To see privileged information at work, we need look no further than the human (or robot) body. The body is special because it has particular ways of interacting with its environment. A room with chairs in it is understood differently by a human with legs than by a robot without them. The thousands of points of raw data describing the room collapse into a few simple ideas when subject to the constraints and demands of a physical body. If a teacher knows what it’s like to have a body, he, she, or it can pass these simple ideas to a student as privileged information, creating an efficient description of a complex environment.
AI researchers are quickly learning the importance of the body, and its pivotal role in constructing and interpreting privileged information. In his lab, Laird is teaching a robot to manipulate foam blocks using language commands. The first time the robot tried to pick up an object, its own arm got in the way of its cameras, and its vision system lost track of what it was trying to pick up. It needed to be taught what its body was, and how it worked.
A good decision rule is not a simple one, it cannot be described by a very few parameters.
This may sound like the growing pains of a developmental robot, but in fact humans also need to learn how their bodies interact with the environment, and eventually start to rely on this information. Linda Smith, a psychologist and brain scientist at Indiana University, Bloomington, told me that people focus their attention like a spotlight in space. Objects in the same spotlight are linked together, or bound, in working (or short-term) memory. Smith found that children between 16 and 24 months of age could bind objects and names together in their working memory, but only if the objects were not moved around. Otherwise, the visual “noise” kept memories from forming.
What happens, though, if the child moves, rather than the object? Tony Morse, a researcher at Plymouth University in the United Kingdom, used a 53-motor robot called iCub, that is able to crawl and walk and learned much like Smith’s children. Unexpectedly, it was unable to maintain the association between an object and its name when it changed its posture and looked back at an object for a second time. The robot was using its body posture to locate itself in space, reducing the complexity of its environment by physical signposting.
Motivated by this result from robotics, Smith looked for the same behavior in children—and found it. “It was a novel prediction and it was absolutely correct,” she said. “It led me to a whole line of research about the role of body movement in disrupting children’s memory.” It turns out that, when your first grade teacher told you to sit down and pay attention, she was drawing on a deep understanding of how memories are formed. It is not something that is obvious, or something that a child would guess to be true, or that an AI researcher would program into a machine. It was privileged information.
The effectiveness of the first grade teacher’s instructions was obscured by the fact that the object of the instruction (to learn) had little to do with the content of the instruction (to sit). But this is often the case. Rick Mantz is now Chairman of the James Kimple Center for Alternative Education at South Brunswick High School, in South Brunswick, N.J., but he originally came to the town to coach the football team. He turned around the program immediately, after years of mostly losing records. How? “It was a character issue,” Mantz told me. “We would monitor kids academically and make sure they behaved in class.” Mantz had found something akin to Pavlovich’s poetry and the first grade teacher’s injunction. In this case, it was a connection between performance on the football field and classroom attitude. “The kids that screw up in the classroom or cafeteria will be the same ones who jump offsides on fourth down in a game,” Mantz explained to me.
If a teacher knows what it’s like having a body, he, she, or it can pass these simple ideas to a student as privileged information.
Mantz could provide me with a lucid explanation of the privileged information he taught. But that is not always possible. Patrice Michaels, who directs vocal studies at The University of Chicago, spoke to me about a subtle aspect of singing that many master clinicians had believed was unteachable, but is now a regular part of singing instruction. It’s called singing “vertically into the harmony,” and has to do with the singer being attuned to the harmonic intention of the composer. “There is a difference in how you sing a G# note that leads to an A, and a G# that leads to an F. It’s the same pitch, but depends on where you are going—how you aim it,” she said. Michaels had a difficult time explaining the difference.
The potential payoffs of exploiting privileged information present an attractive target for AI researchers. Nearly 30 years ago, George Reeke and Nobel Prize winner Gerald Edelman showed that AI systems that traced letters with robotic arms had an easier time recognizing diverse styles of handwriting and letters than visual-only systems. Today, Giorgio Metta, who built the iCub robot at the Italian Institute of Technology, is counting on iCub’s physicality to help it learn. “In many cases, humans use their own knowledge about actions to recognize those actions in others,” he told me. “If a robot knows how to grasp, it has better chance of recognizing grasping actions of a person,” he said. Metta is going one step further, by teaching iCub to follow social cues, like directing its attention to an object when someone looks at it. These cues become another channel through which to collect privileged information about a complex world.
Speaking before a group of philosophers at Carnegie-Mellon University in 2012, Vapnik asked whether we can ever understand a complex world using only technical models like the ones now used in machine learning. “Machine learning science is not only about computers,” he said, “but about humans, and the unity of logic, emotion, and culture.” What about Vapnik’s own teachers? He relayed another anecdote, this one about his music teacher. “Most of the time, I thought the teacher was talking nonsense,” he said, using one of his favorite synonyms for privileged information. “But I always understood what he wanted.”
Alan Brown is a freelance writer focusing on the intersection of science and engineering.
This article was originally published in our “Secret Codes” issue in October, 2013.