What is Action Recognition?


Aug. 24, 2021

In computer vision, action recognition is the task of identifying when a person in an image or video is performing a given action. AI models can be trained to recognize a variety of actions, from running and sleeping to drinking, falling, or riding a bike.

Learning to detect, and distinguish between, different actions is an essential task for both humans and computer vision models. Human actions are everywhere from home surveillance systems to films and TV shows. This subfield of computer vision presents challenges due to the need for contextual information: for example, the same motion can represent different actions (e.g. locking a door or starting a car).

AI models for action recognition are trained on large datasets of images or videos. The task of action recognition can be further subdivided into different activities:

  • Action classification seeks to assign the correct label (e.g. “cooking,” “writing,” etc.) to a given image or video.
  • Action localization, given a particular action and a video as input, seeks to identify the correct location and timestamp in the video when the action is being performed.

How can action recognition be used for real-world applications? The possible use cases of action recognition in computer vision include:

  • Safety and security AI: Using surveillance cameras, action recognition can help identify and isolate footage of people who are performing suspicious actions, especially in unusual contexts, such as late at night or in restricted areas. The system can then send alerts to the appropriate individuals for further investigation.
  • Healthcare AI: When in-home caregivers for the elderly are away or unavailable, fall detection systems (with home cameras in key locations) can help detect falls that may result in serious injury. Check out this short video for a demonstration of a fall detection system in action.
  • Media AI: Action recognition systems can help sort through large quantities of multimedia (images, videos, etc.) to identify relevant content. For example, users could run a search for images of “Vladimir Putin shaking hands” to find the right illustration for a news article.

Need more computer vision definitions? Check out our computer vision glossary for an overview of everything you need to know. Is there a use case of action recognition that could benefit your business?