SAS’s Presentation at Infinite Vision

Back

Feb. 8, 2022

Computer Vision will have a huge impact on how AI “understands” the world. Hear Mark Wolff, Chief Health and Life Sciences Analytics Strategist at SAS

Watch all the videos from Infinite Vision 2022

Mark Wolff:

Hello, my name is Mark Wolff and I’m an industry consultant at SAS Institute working at the global headquarters in Cary, North Carolina. I’m also a visiting fellow at the University of Miami, the Institute for Data Science and Computing. The title of this presentation alludes to both the promise and the challenges of computer vision. I very much believe, that computer vision unlike many other sensing modalities has the potential to deliver on some of the more aspirational predictions given the current progress being made in artificial intelligence.

Mark Wolff:

But may also play a critical role in addressing the challenges of what some refer to as artificial general intelligence, where human like sensory perception capabilities will be critical to the goal of an intelligent agent being able to understand or learn intellectual tasks that a human can perform. One of the more significant challenges that computer vision faces is evolving from a singular sensing modality to that transcending image analysis, formally the detailed examination of elements of structure of something to a technological system that moves beyond analysis and develops attributes of understanding and derivation of meaning.

Mark Wolff:

So let’s begin with some history, some definitions and examine how we can evolve computer vision to move beyond current limitations and move towards as some say infinite vision. Let’s begin with a bit of art history and philosophy. The Dutch painter Piet Mondrian, was one of the most influential painters in history and a pioneer of the abstract art movement throughout his career Mondrian evolved from highly representational art, often focusing on landscapes to highly abstract and geometric paintings. In a sense as his art evolved Mon reduced images to their absolute minimum.

Mark Wolff:

That is the three primary colors, the three primary values and the two primary directions. That is red, blue, yellow, black, white, gray, and horizontal and vertical. A question that one may ask is to what extent can human understanding of each of these images, as you see, before you, compare to a pure computer vision analysis of each work. More importantly, what role does context and external data in other words, metadata and tagging play in supporting human understanding of each image. And interestingly, what are the limits of human understanding even within context of understanding what Piet Mondrian was working towards.

Mark Wolff:

A human and a computer may well each correctly identify the first painting as a tree, but at what point does the human ability to correctly identify each image deviate from a computer vision approach and how important is contextual data, both from a human and a machine perspective and what role does structural or innate, that is hardwired elements of the nervous system playing, giving humans an advantage over computers and understanding a Mondrian painting for example?

Mark Wolff:

Let’s begin at an important point in history. Many are familiar with the history of the term artificial intelligence. It was first used in 1955 in a proposal drafted by John McCarthy. A computer and cognitive scientist and professor at Dartmouth College, along with several colleagues when the group proposed a summer research project on the subject of artificial intelligence. And true, just it’s interesting to note that the group of four was represented by two from academia and two from industry. It was clear, very early on, that the question of intelligence was not a purely academic problem but rather one that involved industry.

Mark Wolff:

In this proposal for funding this quote summer research project unquote, had four key points. First of all, it was only a two month project of 10 individuals. And the proposition was that every aspect of learning or any other feature of intelligence can in principle be so precisely described, that a machine can be made to simulate it.

Mark Wolff:

Secondly, make machines use language from abstractions and concepts, solve problems now reserved for humans,. Today, in addition to language, we would add images. And what is remarkable is that they stated that a carefully selected group of scientists working for two months could make significant progress in this area. That was the early fifties, 1956 to be precise. I think they were going to need more than two months.

Mark Wolff:

Another pioneer in the endeavor to understand intelligence was Ross Ashby. An English psychiatrist and a pioneer of what was referred to at the time as cybernetics and the study of complex systems. At the 1961 Western Joint Computer Conference, interestingly hosted by the Institute for Radio Engineers, Ashby made the following statement related to machine and human intelligence. He said, quote, machines can be made as intelligent as we please, but both they and man are bounded by the fact that their intelligence cannot exceed their powers of receiving and processing information. Ashby’s definition of intelligence was a function of both the amount and the volume of data that can be acquired.

Mark Wolff:

And the ability to process those data. In essence, Ashby defined intelligence as an IO or input output problem. Clearly his view was that machines could be made intelligent. As an aside, if you’re curious as to why radio engineers are hosting a meeting related to computers and cybernetics, just look up the origins of the term radio operator curve or ROC for short, I’m sure you’ll find that an interesting bit of history.

Mark Wolff:

So, let’s get back to our understanding of intelligence and the role that vision plays. The Oxford English dictionary defines intelligence very succinctly quote, the ability to acquire and apply knowledge and skills. This definition on the surface does not seem to differentiate between humans and machines. And in a sense, it’s very much like Ross Ashby’s definition. That is, collect data, acquire it and apply it, process it, but let’s look at the definition in more detail.

Mark Wolff:

The root of the word intelligence is from the Latin intellegentia, from intellegere, to understand. That begs the question. What does it mean to understand and how does understanding relate to intelligence? Let’s get back to our understanding of intelligence and the role that vision plays. The Oxford English dictionary defines intelligence very succinctly, the ability to acquire and apply knowledge and skills. This definition on the surface does not seem to differentiate between humans and machines.

Mark Wolff:

And in a sense is very much like Ross Ashby’s definition that is collect data acquire and apply it, process it. But let’s look at the definition in more detail. The root of the word intelligence is from the Latin intellegentia, from intellegere, to understand. That begs the question. What does it mean to understand and how does understanding relate to intelligence? Now, moving forward and trying to understand intelligence and computer vision together?

Mark Wolff:

I think we all understand and have experienced CAPTCHA. We’re all familiar with it. A CAPTCHA program as we know is one that protects websites against bots and by generating and grading tests that humans can pass but current computer programs cannot. Humans can read distorted text as one shown here in this slide, but computers cannot. Not yet anyway. But they are making progress. So how many of you actually know what CAPTCHA is an acronym for? Well, it stands for completely automated public touring tests to tell computers and humans apart. That’s right. CAPTCHA is a touring test.

Mark Wolff:

So let’s try to understand how CAPTCHA works and in doing so, get an insight into the nature of human intelligence. Indeed many researchers and many papers have been published recently addressing the issue of exactly the mechanism of CAPTCHA, how that relates to our own internal intelligence architecture in the central nervous system in the brain and how we process information. Again, relating to Ross Ashby’s definition of amount of data versus processing power.

Mark Wolff:

So, in order to have intelligence or as we’ve defined understanding, particularly as related to CAPTCHA, we need to consider two critical factors. One is that of invariance and the other one of selectivity. So for a system to identify and object, it needs these two qualities. Now in variance is the ability of a system to respond similarly to a different views of the same object. Selectivity as a system’s component produces different responses to potentially quite similar objects, such as different faces for example, even when presented from a similar viewpoint.

Mark Wolff:

What this means is that when I look at a room full of people, I can be selective and identify individual faces. Unique faces, and I can be invariant. That they’re all faces in the room. But what is very interesting and what is perhaps critical is that it is straightforward to make a detector, a system to visualize something that is either invariant but not selective or selective but not invariant.

Mark Wolff:

Each is relatively straightforward to design. What is difficult and why CAPTCHA still works is to have both events occurring at the same time. That is to be invariant and selective contemporaneously. Humans can do that and as such disambiguate a CAPTCHA puzzle. Machines yet cannot do that. Now, one word about understanding. What is important here is that understanding is defined as perceived or intended meaning. That is to infer something from information received.

Mark Wolff:

To be very pedantic, to understand, is to not fully know. And therefore we do not know what a CAPTCHA says, but we understand what a CAPTCHA says. And as computers have become better and better, primarily through brute force methods of understanding no, of trying to know what is an in an image we have had sort of an arms race, as you can see here, CAPTCHA’s becoming more and more complex to the point now where in some cases, humans are complaining that they can’t solve the CAPTCHA.

Mark Wolff:

Now with that, how then does that relate to our earlier slide of the paintings of Piet Mondrian? Is that in a sense, a similar situation to CAPTCHA? That we have an image, as you can see here, the first image. Easily recognizable. And we have trans mutated that image into something that is completely reductionist. Now it may have a philosophical or artistic goal how to reduce or deconstruct an image to its absolute mini minimum elements.

Mark Wolff:

And at what point does the human and the computer deviate with our ability to follow that progression of that images deconstruction? And at what point do even humans lose an understanding of meaning? Now I know these seem like very philosophical or very theoretical concepts, but they’re very important because what we’re trying to understand here is at what point can computer vision understand. And in understanding at what point can it derive meaning.

Mark Wolff:

So as long as we’re being pedantic, we can bring in two more concepts. The concept of common sense and the concept of imagination. Common sense being a judgment that is independent of knowledge or training and imagination, concepts that are not actually present to the senses. So what that means is, that we are able to create information from where there is none in essence. And so to that extent we are identifying an object without any tagging, without any metadata, without any context. So what is then the limit of transferring those capabilities that our brain and nervous system has, to a machine.

Mark Wolff:

So let’s discuss that in a little bit. So CAPTCHA then, trying to summarize what we’ve started here, is not about knowing an image. We don’t know what those words are and the computer doesn’t know what those words are. We understand. And therefore the critical component here is what does it mean for an image analysis system to understand. Now human intelligence and machine intelligence unlike what Ashby said, is not just about computational power and memory.

Mark Wolff:

We, the human brain, are composed of many, many systems and subsystems that have gone through billions of years of iteration in response to survival in various environment. So human intelligence defines as understanding is more than computational power and data volume. It’s also about structural elements of the brain and the nervous system. Those that have come through iteration of evolution and these systems are critical to understanding human intelligence and in particular vision as being the primary sensory modality.

Mark Wolff:

For example, babies are pre-wired to perceive the world. They have a hard coded understanding of certain environmental elements that aid in their protection. And the findings of research in this space show that innate connectivity in the brain, structural functional elements of the brain, precede, are first before the emergence of domain specific function. That is how we learn and how we bring in a domain specific function for a particular behavior.

Mark Wolff:

Now, this presents a very new view on the origins of knowledge. Is knowledge acquired or to some extent is knowledge pre-wired. At the moment it is both. So the question for computer vision and computing is do we need to then model how the human central nervous system evolved and do we need to not only think about the software but the hardware? Should the hardware in essence, recapitulate the structural elements that facilitate quote, unquote, understanding and knowledge in the human brain.

Mark Wolff:

So how do we then build imaging hardware that is human-like? How do we develop techniques of auto and deep tagging? How do we introduce rich metadata around an image? How do we introduce connectivity, internet of things, sensory data beyond visual modalities, how we introduce analytics occurring in the event stream as data are moving into a system, how do we process them in that stream? And how do we take advantage of moving streaming sensor data, enriching visual data to create learning and adaptive models.

Mark Wolff:

That in a sense describes how the brain works. And that in a sense, potentially is the path forward that we look to the brain to develop how we will build these infinite vision systems. So the internet of things is more than just low latency high bandwidth connectivity. It’s about connecting people, machines, environments. It’s about forming a completely new technological structure that moves beyond our current paradigm.

Mark Wolff:

And that structure will naturally almost revolution of technology imitate the human nervous system. And I’ll talk to that in a moment. So very quickly to finish up here. It is interesting that IOT is really a key driver in the current development and advances in AI because of the amount of data that it produces for data hungry AI. More interestingly IOT won’t work with AI because of the data volumes and the dimensionality we need AI to manage all those data and all that connectivity.

Mark Wolff:

And finally that paradox, that paradox is driving something quite remarkable. And it’s addressing the issue I stated earlier. That the architecture and the design of systems, that not only recognize images but can understand images and infer meaning, really is being built in a sense by recapitulating the human nervous system. We have for example, a almost one to one relationship between constructs of the human brain and central nervous system and current technological constructs of edge analytics, sensor based telemetry, cloud computing and fog computing. And what is remarkable about this situation?

Mark Wolff:

This sort of one to one relationship between technology and biology is that whether by design or by accident, the iteration of technology is actually following the evolutionary path of the human nervous system. And why wouldn’t it? It’s quite an efficient and functional set of technologies that are biologic. Why shouldn’t electronic technologies find the most parsimonious path towards… Why shouldn’t electronic technologies find a parsimonious path towards performing similar functions to biology by following biology’s lead? So to that, I thank you and hopefully if you’re a human, you can read that.