Scientists have created an algorithm to identify objects in photographs with single-pixel accuracy and without the need for human supervision.

It is a joint effort of CSAIL at MIT, Microsoft and Cornell University. They hope to have solved one the most difficult tasks in computer vision: assigning labels to every pixel in the universe, without any human supervision.

Computer vision (AI) is an area of artificial intelligence that allows computers to extract meaningful information from digital images.

STEGO is able to learn something called “semantic sectionation,” which involves assigning a label for every pixel within an image. This skill is vital for computer-vision systems today because images can become cluttered by objects.

Normally, humans draw boxes around objects in an image to create training data. You might draw a box around a cat, in a field of green grass, and label what’s inside it “cat.”

The semantic segmentation technique will identify every pixel in the cat’s body and not mix any grass. It’s similar to Photoshop using the Object Selection tool instead of the Rectangular Marquee.

Human technique has the problem that it requires thousands of labeled images to train its algorithm. One image of 256×256 pixels is composed of 65,536 individual pixels. Labelling every pixel in 100,000 images is absurd.

MIT Stego

Seeing The World

Emerging technologies, however, require machines to be capable of reading the world around them in order to make it possible for self-driving cars or medical diagnostics. Cameras are also needed to help humans understand what it is seeing.

Mark Hamilton, the lead author of the paper on STEGO, suggested that technology could be used for scanning “emerging domains”, where humans don’t know the right objects.

He said, speaking to MIT News, “In these kinds of situations where you want a method that operates at the limits of science, it’s not possible to rely on humans to solve it before machines do.”

STEGO was trained in a range of visual domains including interiors and aerial shots at high altitude. The new system was twice as efficient as previous semantic segmentation systems and closely matches what humans deem objects to be.

“Stego successfully segmented roads, people and street signs when applied to driverless vehicle datasets with a much higher resolution and granularity that previous systems.” The system was able to break down every square foot of Earth’s surface into roads, vegetation, buildings, according to images taken from space,” says the MIT CSAIL team.

You can still use the algorithm

STEGO was still having trouble distinguishing between food like grits or pasta. Odd images also caused confusion, such as the one showing a banana sitting on a telephone receiver. The receiver was labeled “foodstuff” instead of “raw materials.”

Andrea Vedaldi, Oxford University, said that although the machine is still struggling to understand what’s a banana from what’s not, the algorithm is the “benchmark of progress in image understanding.”

“This research is perhaps the most direct and powerful demonstration of unsupervised segmentation’s progress.”