What is computer vision?

Back

Jul. 15, 2021

Computer vision is a subfield of computer science, artificial intelligence, and machine learning that provides computers with an “understanding” of digital images and videos. The ultimate goal of computer vision is to enable machines to “interpret” visual data in the same way that humans do—detecting objects, people, faces, events, actions, even time.

Subfields of computer vision include:

  • Image recognition: Identifying the most important high-level contents of an image.
  • Object recognition: Recognizing and identifying the most prominent objects in an image. The difference between image recognition and object recognition is subtle but important. For example, given an image of a soccer game, a computer vision model trained for image recognition might return simply “soccer game,” whereas a model for object recognitionwould return the labels and locations of the different objects in the image(the players, ball, goals, etc.).
  • Action recognition: Identifying when a person is performing a given action (e.g. running, sleeping, falling, etc.).
  • Facial recognition: Identifying individuals by examining various facial features (e.g. the distance and location of the eyes, nose, mouth, and cheekbones).
  • Image segmentation: Partitioning images into multiple segments, i.e. contiguous sets of pixels. Image segmentation models assign a category to each pixel in the image, grouping them together into objects, people, backgrounds, etc.
  • Event detection: Identifying when a given event has occurred.
  • Motion tracking: Following the motion of a person or object across multiple frames in a video.

WHAT IS COMPUTER VISION?

In recent years, computer vision has made great strides in both accuracy and speed, thanks to techniques such as deep learning and neural networks. Convolutional neural networks, in particular, are highly effective at working with the unique structure of digital images and videos. For maximum effectiveness, training computer vision models also requires vast quantities of labeled data.

Deep learning uses very large artificial neural networks: AI models composed of many layers of interlocking “neurons” that receive data, perform simple calculations, and send the results to other neurons. In particular, convolutional neural networks (CNNs) are highly effective at working with the grid-like, pixelated structure of digital images and videos.

For maximum effectiveness, training compute

r vision models also requires vast quantities of labeled data. There are several possible solutions to this problem:

The applications of computer vision are nearly limitless: from powering self-driving cars and detecting wildfires to monitoring industrial manufacturing equipment and diagnosing medical issues. Businesses of all sizes and industries can benefit from integrating computer vision into their workflows. When they’re achieved with the right partner, computer vision solutions can help boost workers’ productivity and efficiency, cut costs, and improve the speed and accuracy of your processes and workflows.

Watch the YouTube Video: What is Computer Vision?

Ema Roloff:

Hello, everybody. I have Jeff with me today, and we’re going to be talking about kind of the introduction to computer vision. But before we get into the questions that I’ve got for you, can you give everybody an introduction?

Jeffrey Goldsmith:

Sure, Ema. Thanks for having me. So my name is Jeff Goldsmith. I’m the head of marketing at Chooch AI, and I’ve been around in sort of watching digital transformation happened for many years, actually. I wrote for Wired Magazine way back in the beginning, in ’94, wrote a piece called “This Is Your Brain on Tetris.” It got a lot of play. But then I also wrote a piece about chess and computers and AI. This was a pretty major piece about Deep Blue and Kasparov and all that business. And then, over the years, I’ve worked at many different startups in digital agencies and so forth and so on. I met the founders of Chooch about five years ago, and then I started working with them about two years ago, a little more than two years ago, and it’s been a very interesting journey because AI is about to get very real in our lives.

Ema Roloff:

It’s interesting to hear how long you’ve been having these conversations. I would like to pretend that I’ve been involved in the industry that long, but I would say I’m newer to the game. But I appreciate you joining me, and I want to get us started today by just kind of giving a introduction to the idea of computer vision and shimmering, kind of, your definition with everybody joining us today.

Jeffrey Goldsmith:

Sure, sure. So computer vision is a kind of AI, and AI doesn’t know anything in the beginning. There’s no intelligence there, it’s artificial. And by that, we mean you train AI and in the case of computer vision, you train AI with images and video. So that makes the AI know something and platforms like Chooch, where I work, makes that happen very efficiently. That’s why the advent of computer vision platforms has happened, right? And then what happens is you take that AI model that you’ve trained and you put that on, let’s say, a computer. It could be an edge device. It could be in the cloud. And then images or video hit that computer and hit the AI model.

Jeffrey Goldsmith:

So let’s say a camera is looking at something, like an empty hallway, and a person walks into that hallway. The AI can be trained to detect a person. So then you can get an alert that says, “Oh, there’s a person in the hallway.” So that’s what AI models do, and they don’t just detect objects like people. You can also detect actions, you can count things, you can track unknown objects, you can track known objects. And this works everywhere from microscopes, the microscopic level, we call it “microscopy,” to satellites to geospatial imagery and everywhere in between. So that’s essentially what computer vision is: subset of AI and you train it with images and then you deploy those AI models so that they can be used live on computers.

Ema Roloff:

Thank you. I think that’s always so tangible, getting the definition so that we all are moving forward. Speaking the same language is so important. You gave a little bit of examples there, but sometimes when we think about computer vision, we think about it as an emerging technology. But there’s a lot of organizations that are using it today. Could you dive into some of those other use cases that are being leveraged by organizations starting now?

Jeffrey Goldsmith:

Sure. And if you look back, I mean really computer vision, it was called “machine vision” at one point. It actually emerged early on, like decades ago. And then the first time I remember it, there was something called an “electric eye.” You walked in front of a door and the door opened, right? It detected something was there. And then this is all the beginning of detecting things with computers and with cameras. The range of use cases is very, very wide. It’s somewhat infinite. We’ve seen everything from wildfire detection from drones and aircraft and looking at shipping with satellites and other features on the ground in satellite imagery. That’s one level we’ve seen it at. I mentioned satellites to microscopes earlier, but that’s a good place to start. At the microscopic level, we’re doing things like detecting defects and industrial materials that are microscopic, powders that are used in processes that need to be consistent. And we’re also doing things like counting cells for drug discovery.

Jeffrey Goldsmith:

This is amazing stuff. It allows drug researchers to more rapidly develop compounds, and you can also type cells for pathology. And in the healthcare space, you can do obvious things like detect cancer formations in imaging. So that’s more on the scientific area of computer vision, but it’s also being used for workplace safety.

Jeffrey Goldsmith:

For example, when workers don’t wear gloves or hardhats or safety vests, that’s when they get injured. So detecting when people are not complying with workplace safety rules, that’s a super important part of computer vision. It allows us to really save lives lower and so forth and so on with workplace safety detection, and that’s applied across industries, that’s applied in construction sites, warehouses, in manufacturing, across the board, right? We have an AI model that you can actually … we’ve developed a demo video where you can see that in action on our website or on YouTube. Then in retail, right? Retailers want to know what consumers are doing anonymously, right? Where are they gathering in stores? What products are they touching and not buying? This kind of stuff is very important, consumer analytics in retail, and also for retail, stockout detection. When are there items not on shelves? And keeping shelves stocked is important because when consumers go to stores, they expect to find what they’re looking for. And if it’s not there, that’s a bad customer experience. So that’s in the retail space.

Jeffrey Goldsmith:

For example, in oil and gas, flare detection is a very important aspect of that business, and we have also seen remote monitoring of locations like when cars arrive or people arrive at a location that’s out in the middle of nowhere. We can detect that and for example, read a license plate or tell us what kind of vehicle that is or is that UPS arriving to deliver something to that location? There’s also a concept called “defect detection.” So in the consumer packaged goods, for example, bottling, you want to make sure that bottle caps are on. You want to make sure in manufacturing, that paint is consistent, that biomedical devices are consistently manufactured to certain specifications.

Ema Roloff:

So thank you for going through all of those use cases. A lot of times, we talk through this type of technology and broad spectrum, talk about it as AI. And some of those, my new, more kind of focused use cases really do a great job of kind of painting the picture of how computer vision fits into the landscape of digital transformation. More specifically, as we look at organizations taking advantage of this technology, what types of benefits do they typically see?

Jeffrey Goldsmith:

Yeah. So it’s a really interesting question. It really is kind of a paradigm shift, so it’s kind of hard to get your head around what the benefits are here. But if you look at the analogy of the industrial revolution, prior to that, shoes were handmade, right? And so then factories were created where people could manufacture shoes, and the benefit of that was everyone had shoes and everyone now has shoes in the world. Prior to the invention of factories that could produce mass-produced shoes, less people had shoes. It’s just the way it was. And with computer vision, you can now … the workplace safety example is really great. Now, you can ensure that more people receive the benefit of workplace safety rules, right? And you can ensure that there’s less defects in products and in processes. Work can happen more efficiently. Costs can be lowered.

Jeffrey Goldsmith:

In that case of cell accounting for drug discovery, instead of people counting cells somewhat inaccurately and taking a lot of time, researchers can set up more experiments and go faster with drug discovery. So it saves time, lowers costs, and this will save lives. It will allow people to do more productive tasks. The whole piece about people fearing they’re going to lose their jobs because of AI is kind of a fallacy. New sorts of jobs will be invented, and people will be able to do different tasks and more of what they already do. Security guards, for example, oftentimes they don’t know why something happened. They’ll go out investigating events, but they won’t know what exactly happened. They’ll know there’s no issue, but they won’t know what tripped off the alarm. But with cameras and computer vision, you can detect things really quickly. You can know that all the doors are closed. You can know what’s happening at the fence line and that sort of thing. So it’s just going to help us be more efficient at our jobs.

Ema Roloff:

So now that we’ve talked a little bit about this idea of where we are, I want to talk about where we’re going and what computer vision can bring to this idea of digital transformation in the next couple of years.

Jeffrey Goldsmith:

Yeah. So it’s pretty amazing what’s happening. So we’ve partnered with some of the biggest companies in the world to bring this to market, and everyone sees the economic benefit of computer vision. And because of the confluence of GPUs, graphical processing units, the same chips that mine Bitcoin and that run graphics, they’re being used for AI. And because of that and because of, sort of, software shift called “deep learning,” because of that confluence, it allows us to generate these AI models very quickly and for any kind of visual use case, you can imagine. So it’s really infinite where this can go: anywhere from your house, knowing where you lost your keys, to how many boxes are in a warehouse and knowing that instantly, right? We’ve developed a way to capture very large images and count things incredibly quickly, even if there’s thousands of objects.

Jeffrey Goldsmith:

And this happens, really, in seconds. So any visual task you can think of, it’s going to be doable with computer vision, and that’s going to happen at the industrial scale, it’s going to happen in your cars. About two years ago, a fellow came up to us at a conference and asked the CTO of Chooch AI, “Can you detect pedestrians in headlamps?” Can the headlamp of a car … he was at a headlamp manufacturer. Can that headlamp detect pedestrians? And the CTO of Chooch looked at me and said, “Yeah, we can do that.” Right? So that’s the scale we’re talking about, about embedding computer vision in all kinds of places, in our lives, and having that benefit us.

Ema Roloff:

So the idea that my house could know where I’ve misplaced things would be pretty critical for me [crosstalk 00:13:14]. I thought I lost my wallet a couple of months ago and lo and behold, it wasn’t in one of my coats. I had put it in my husband’s coat pocket and couldn’t find it for weeks and had to replace everything. So imagine how nice that would have been for me to have that kind of thing.

Jeffrey Goldsmith:

It’s an interesting example, thinking about your house could know where your things are. It’s kind of a dramatic example, but I think that that helps people get where this can go. Your car could help you be safer by noticing when you’re dozing or when you’re distracted, right? And give you an alert, the same way your seatbelt tells you, “Hey, you got to be safe.” Right? If you’re not wearing it, it gives you an alert. Distracted driving, dozing, this can actually save lives and reduce accidents on the road. And it’s something that we’re looking at here at Chooch AI. This can happen anywhere: from your car to your house to your factory to your job, microscopic, satellite.

Ema Roloff:

Well, thank you so much for joining me today, Jeff. And I want to encourage everybody to make sure that if you’re not already following him, make your way over and follow him here on LinkedIn. Check out Chooch, spend some time learning a little bit more about computer vision. As you heard, it’s clearly going to impact the way that we live our lives here in the very near future. But thank you again for joining me and have a wonderful day, everyone.

Jeffrey Goldsmith:

Thanks, Ema. Really appreciate it.

Ema Roloff:

If you’re looking for expert tips on how to get started with your transformation or looking to hone in on your approach, make sure that you subscribe to our channel to catch our weekly digital transformation talk series, where we interview experts from around the world on how to make it happen.

×

Oops, something went wrong.
Please try back later.

Click here Tap here to upload Upload your image
and Chooch will recognize it.
Supported formats are .jpg .jpeg .png

×