NVIDIA’s Presentation at Infinite Vision

Back

Feb. 8, 2022

How Will AI, Computer Vision and Avatars Transform Our Lives? Get the vision from Adam Scraba of NVIDIA

Watch all the videos from Infinite Vision 2022

Adam Scraba:

Hi, I’m Adam Scraba. I lead marketing for NVIDIA Metropolis, which is a platform built to solve some of the world’s most important operational efficiency and safety problems. What we’ve been doing for the last six to seven years is using AI and sensors to instrument physical space, and increasingly automate, really important problems, really important processes across a number of verticals. What I want to share with you today is effectively the journey that we’ve been on and that we see happening in the marketplace, where all of our really important spaces and our transactions are going to be impacted in a huge way, in a hugely positive way with AI avatars and computer vision. And so what I’d like to do is kind of take you on a little bit of a journey.

Adam Scraba:

Some of you, or some of us love the journey to an airport and to new places, some of us maybe hate it, but I want to take you on a journey of a fairly common experience that we have, and kind of share with you just as an example, just how impactful and how many spaces and transactions that we deal with on a daily basis.

Adam Scraba:

Whether we fly somewhere, whether we go to the grocery store, whether we buy something, whether we go through a drive through. So first off in almost any one of our transactions, we travel somewhere. We’re going to take a vehicle. We might take an Uber. There is traffic problems. There’s parking that we have to deal with. Curbside management becomes a major issue, whether you’re at an airport or you’re in a city. And so it takes us a while to actually get to the place that we’re going. And there’s already a number of spaces is that we navigate and a number of transactions that we’re actually navigating through negotiating that we might not actually think about too much, and that we get to the airport and you need to check in and you need to get your luggage to the check counter. You need to verify your identity.

Adam Scraba:

You might need to use a washroom before you get there, because you’re going to be in line for an hour through security. Often, we need to give some of our identity, perhaps it’s a passport, get a boarding pass, get some tags for our luggage. And so you can imagine now we’ve been at the airport for 30 minutes and we’ve had another, perhaps, 50 transactions that we’ve gone through. We finally get through security. We are at the gate. We need to find where the gate is. You need to navigate through the airport just to find your gate. You might want to buy a book. You might want to get some food. You’re well into your journey, you’re still trying to find your gate, again, there might be a washroom break involved.

Adam Scraba:

And then you finally get to the gate where your flight is about to take off and unbeknownst to you, there’s actually a huge orchestration of events happening behind the scenes, just to get the aircraft ready for you to take off there’s refueling, there’s getting the previous passengers off, there’s refilling the snacks. There could be deicing happening. There is mechanical checks happening. There’s this huge orchestration of things that are happening, that you don’t even know about, your luggage is making its way to the aircraft. And finally getting on board. You finally get on board, you again, probably show some identity and you make your way, and you’re finally in your seat and now you’re on your way to your destination.

Adam Scraba:

And so what I’ve kind of just described to you is a pretty typical thing that we all do and it could be an airport. It could be literally your trip to the corner store. There’s a ton of transactions that happen. And whether we like it or not, we do a lot online, but we still live in a physical world. And our lives are still very physical. And there’s a lot of spaces and there’s a lot of transactions that are physical in our world. Where we’re headed is that, and frankly, it’s been really a personal journey for me over the last six years of my life for the last six years has been leveraging working with teams, partnering with people, standing up complete AI development efforts around the world to increasingly leverage AI and robotics, to automate space.

Adam Scraba:

And where we’re headed is really all of our spaces. All of our most important spaces are actually going to become robots. Now, these robots literally may look like little robots and they might be moving around in our space. The spaces themselves will also become robots. Our buildings are going to become robots. Our intersections are already kind of robots, but they’re going to be even smarter robots over time. Our restaurants are going to be robots. The factories and warehouses will be robots. And so increasingly automation is going to take over a lot of our physical space. And so the team that I lead and the work that we do at Nvidia on our team is really using lot of these technologies to make this happen. And a big part of navigating and negotiating these transactions that are going to be automated is how we communicate with our world and how we communicate with these spaces.

Adam Scraba:

Now, we’re literally going to be communicating with machines. And so what I want to share with you is some work that we’ve been doing over the last few years to incorporate an incredibly complex set of processes together, AI and different AI modalities. And what you’re going to see here is some really interesting stuff. It’s some magical stuff where we’re bringing together Conversational AI. I’m going to be showing you some computer vision. I’m going to be showing you recommendering engines. I’m going to be showing you really interesting and delightful rendering. That’s all comes together to build what is effectively going to be our interface with the machines of the future.

Adam Scraba:

So this first demo that I’m going to show you is an example of a demo we recently did and share with the world, which is CEO of our company of Invidia, Jensen Huang. Basically we call it a toy Jensen. It’s a miniature avatar of Jensen in a Q and A session. And all of what you’re seeing is completely AI driven. It’s all real time, and it wasn’t actually preconfigured. This is all effectively a live demonstration.
Automated Voice 1:
Hi. Hello. Hello toy Jensen.
Automated Voice 2 (Jensen):
Hi, please ask me anything.
Automated Voice 1:
What are the greatest threats of climate change?
Automated Voice 2 (Jensen):
The greatest threats of climate change are the effects on human health, food security, water supplies, and the environment.
Automated Voice 1:
How do astronomers look for exoplanets?
Automated Voice 2 (Jensen):
Most example, units are found by the transit method. That was the case for the TRAPPIST-1 planets.
Automated Voice 1:
Tell me more about that method.
Automated Voice 2 (Jensen):
The transit method is when a planet passes in front of its star, the amount of light block tells astronomers the size of the planet. The duration of the transit tells astronomers the orbital period of the planet.
Automated Voice 1:
What telescopes use that method?

Adam Scraba:

I’m going to pause here. And because this goes on for a little while, but I want to point out a couple things. What’s interesting about what you saw and it might look cute or silly, but there’s some really interesting magic going on. So first off, you probably heard of some things like some natural language conversation. So someone is having a conversation with a piece of software and the software needs to understand, literally the audio are coming out of their mouth. They need to put it into words. They need to infer what the meaning of the question is. And the questions should be just like, you’d speak to anyone just should be natural. And so this piece of software needs to understand a lot of stuff. And then it needs to dig into an incredibly huge understanding base on knowledge base, a database of information and respond back.

Adam Scraba:

And so what we’re showing here is Speech AI. We’re showing natural language understanding. There is a 530 billion parameter, natural language understanding model that’s being used to understand what’s being spoken, the avatar itself needs to be rendered in a lifelike way. There is some definitely some delays, but it’s pretty real time. And then what happens is this answer needs to bubble up from this piece of software and then be articulated. And what’s articulated is through a voice it’s through a natural animation of this character. And it’s actually in Jensen’s voice. And the voice that you’re hearing, isn’t a prerecorded version of Jensen explaining something. This is actually a piece of AI software that’s been trained on Jensen’s voice to sound like him and this avatar could be saying anything.

Adam Scraba:

So it’s pretty magical. Now, maybe a more, perhaps a more obvious robotic example is this next one that I’m going to share with you, which is also pretty magical because this hints at the future of what our transactions are going to be like, this is an example of effectively, re-envisioning what an ordering of food might look like. Again, this is combining rendering, this combines Speech AI, computer vision, and it brings it all together in what is effectively going to be the future of us ordering things, which is pretty incredible.
Automated Voice 3:
Hi, welcome to Shannon’s Cafe. How can I help you?
Automated Speaker 4:
Can I have a cheeseburger with the side of onion ring, please?
Automated Voice 3:
Would you be interested in our popular toppings for cheeseburger, like bacon and fried onions?
Automated Speaker 4:
Yeah, let’s do it.
Automated Voice 3:
Done. Added cheeseburger with bacon and fried onions. What else?
Automated Speaker 5:
What protein options do you have?
Automated Voice 3:
We have multiple options. I recommend the double protein burger. Would you like that?
Automated Speaker 5:
Do you have any vegetarian options?
Automated Voice 3:
I can get you a black bean Patty instead. Would you like that?
Automated Speaker 5:
Sure.
Automated Voice 3:
Okay. I have added double protein burger with black bean Patty to your cart. What else?
Automated Speaker 5:
Do you have any [sites 00:10:55].
Automated Voice 3:
We have these three vegetarian items, French fries, onion rings, and fried pickles. What would you like?
Automated Speaker 4:
I’ll have the French fries instead of the onion rings, please.
Automated Voice 3:
Sure. Would you like some garlic on it?
Automated Speaker 4:
Yes, please.
Automated Voice 3:
Okay. I have added French fries with garlic to your cart. What else?
Automated Speaker 4:
What salads do you-

Adam Scraba:

Sorry. So what I just showed you there is as I mentioned, it brings in all together a number of modalities of AI in one application. And so this is incredible. And this is really, as I mentioned, the future of our spaces, the future of our transactions will be a combination of simulation and AI and understanding, and computer vision all brought together. And so, while this does indicate the future of what things are going to look like, we’ve been working for the last six years with a number of partners to deploy solutions around the world. And as I mentioned, roadways and traffic congestion and public safety has arguably been sort of the first incarnation of turning our spaces into robots.

Adam Scraba:

And if you really think about a robot, a robot, frankly, does three things, it perceives, there’s a sort of reasoning that’s happening. And then it takes action. And that’s it. And really every robot really does the same thing. And for the first few years, we’ve really been giving our spaces, perception through things like computer vision and video and LIDAR. We’re now doing things like using, like you’re seeing here digital twins and simulations to simulate what might our roadways look like? And then we implement them. And then we use computer vision to measure the effectiveness and the efficacy of what we’ve just done. Have we made our spaces safer? Have we made them more efficient? And so this is happening around the world. This is production class. This is being rolled out in a big way in every major company, in every major city around the world.

Adam Scraba:

And so we’re handling, we’re helping pay for infrastructure. We’re automating toll ways with computer revision and making that more seamless, which is a really important thing for us to have our infrastructure paid for. We’re working with companies like airlines, who for them, their biggest asset is frankly, their fleet of aircraft. And so can we turn around an aircraft at a gate seconds or minutes faster and help save time, reduce delays, reduce people waiting around in the airport and ultimately provide profit back to the airlines. We’re also increasing, we’re using computer vision to increase the world’s capacity, to do things that you might not expect, but increase over capacity to recycle materials and to make an impact on the environment. And this is, as you can imagine, as you can see, this is incredibly a dirty and dangerous and ugly, frankly, challenging AI problem, but we’re doing that in a big way. And the next example that I’m going to share with you is combining robotics with robotics and helping robots and humans coexist in very challenging spaces. (singing).

Adam Scraba:

So we’ve been at it for a while. We’ve built an incredible set of partnerships. We’ve built an incredible ecosystem. The world’s increasingly rolling out AI in a big way. It’s production class. It’s hyper valuable. It’s important, but I think we’re still at the very early stage. And as I mentioned, I think in summary we’re going to be increasingly adding more and more levels of automation. We’re going to be communicating in a more delightful way with the robots of our future. And the future is incredibly bright. So with that, I’ll end on that note. Thank you very much.