MIT researchers have devised a method for assessing how robust machine-learning models known as neural networks are for various tasks, by detecting when the models make mistakes they shouldn’t.
Convolutional neural networks (CNNs) are designed to process and classify images for computer vision and many other tasks. But slight modifications that are imperceptible to the human eye — say, a few darker pixels within an image — may cause a CNN to produce a drastically different classification. Such modifications are known as “adversarial examples.” Studying the effects of adversarial examples on neural networks can help researchers determine how their models could be vulnerable to unexpected inputs in the real world.
For example, driverless cars can use CNNs to process visual input and produce an appropriate response. If the car approaches a stop sign, it would recognize the sign and stop. But a 2018 paper found that placing a certain black-and-white sticker on the stop sign could, in fact, fool a driverless car’s CNN to misclassify the sign, which could potentially cause it to not stop at all. READ MORE ON: MIT NEW