It can be computer vision It’s much faster and better if we skip the concept of still frames and instead analyze the data stream directly from the camera. At least, that’s the theory that the new brainchild spinning out of the MIT Media Lab, Ubicept, works.
Most computer vision applications work the same way: a camera takes an image (or a quick sequence of images in the case of video). These still frames are transmitted to a computer, which then performs the analysis to determine what is in the image. Seems simple enough.
But there’s a problem: that paradigm still thinks creating frames is a good idea. As people who are used to seeing photos and videos, this may seem logical. Computers don’t care, and Ubisoft believes that ignoring framerate makes computer vision much better and more reliable.
The company itself is a collaboration between the co-founders. Sebastian Bauer is the company’s CEO and a postdoctoral fellow at the University of Wisconsin working on lidar systems. Tristan Swedish is now the CTO of Ubicept. Prior to that, he was a research assistant and master’s and Ph.D. student at the MIT Media Lab for eight years.
“There are 45 billion cameras in the world, and most of them are creating images and videos that cannot be seen by humans,” Bauer said. “These cameras are mostly for perception, for systems to make decisions based on that perception. Think of autonomous driving, for example, as a pedestrian recognition system. All these studies that are coming out show that pedestrian detection works well in bright sunlight, especially in low light. Other examples are for industry. They are sorting, inspection and quality assurance cameras. All these cameras are used for automatic decision making. In a well-lit room or in daylight, they work well. But problems arise in low light, especially in fast motion.
The company’s solution is to bypass the “still frame,” the source of truth for computer vision, and instead measure individual photons that hit the imaging sensor directly. That can be done with a single photo avalanche diode array (or SPAD array, among friends). This raw data stream can be fed into a field-programmable gate array (FPGA, a type of super-specialized processor) and further analyzed by computer vision algorithms.
The newly formed company showed off the technology at CES in Las Vegas in January, and it has some pretty cool plans for the future of computing.
“Our vision is to have at least 10% cameras in the next five years and at least 50% cameras in the next 10 years,” Bauer predicted. “When you’re getting each photo at high speed, you’re doing everything nature allows you to do. And you’ll see the benefits, like the high-quality videos on our website, blowing everything else out of the water.
TechCrunch saw the technology at a recent demonstration in Boston and wanted to know how the technology works and what the implications are for computer vision and AI applications.
A new way of seeing
Digital cameras generally work by “counting” the number of photons that hit each of the sensor’s pixels in a given period of time, capturing a single-frame exposure. At the end of the time period, all these photons are multiplied together, and you have a static photograph. If nothing moves in the image, that works great, but the “nothing moves” thing is a huge caveat, especially with computer vision. Everything is always moving when you try to use cameras to make decisions.
Of course, with the raw data, the company can still combine the photon stream into a frame, creating beautifully sharp video with no motion blur. Perhaps more interestingly, the distribution of frame ideas meant that the Ubicept team was able to take the raw data and analyze it directly. Here’s an example video showing the amazing difference it can make in action: