News Releases

Konica Minolta’s Two Papers on AI Technology Related to Human Behavior Accepted by CVPR 2023, a Leading Conference in Computer Vision
“Fast and High-accuracy Human Behavior Recognition” and “Zero-shot Abnormal Behavior Recognition”

April 4, 2023

Tokyo (April 4, 2023) – Konica Minolta, Inc. (Konica Minolta) announced that its two papers on the development of AI technology related to human behavior have been accepted by CVPR 2023, which is one of the leading conferences in the field of computer vision.

Computer Vision and Pattern Recognition (CVPR) is an annual international conference on computer vision and pattern recognition, and will be held in Vancouver, Canada in June this year. Google Scholar, a search engine that specializes in scholarly literature, gives CVPR a high ranking in terms of impact in academic fields based on the number of papers published and number of citations. For the upcoming conference, 2,360 out of 9,155 papers have been accepted. Konica Minolta believes that both of its papers have been accepted by this influential international conference in the field of computer vision because the company’s AI technology related to human behavior was highly evaluated as valuable research.

Details of Research

Konica Minolta developed FORXAI Imaging AI pose estimation technology, a proprietary technology to estimate the poses of humans by automatically estimating humans in an image and quickly tracking key points on their bodies, such as the eyes, arms, and legs using AI. The technology is used in various industries, such as manufacturing, retail, and nursing care, in combination with algorithms, including behavior recognition. Built on Konica Minolta’s long experience of developing office and medical equipment with fast image processing, this technology boasts high image processing speed and high recognition accuracy, as well as low power consumption and cost, due to edge computing.

“Fast and high-accuracy human behavior recognition,” which is the first theme, successfully increased the speed of processes, which ranged from the concurrent estimation of the pose of humans and contours of objects to the recognition of behavior, to about 1,900 fps*, which is 211 times faster than the conventional approach. The amount of data to be processed was significantly reduced to increase the speed by efficiently capturing only the pose of humans and contours of objects as a point cloud in a video. The accuracy of behavior recognition was improved by estimating the types of objects based on their contours and by combining with the pose movement. The technology is expected to be widely used.

The kinetics dataset used in this image is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.

“Zero-shot abnormal behavior recognition,” which is the second theme, is a “zero-shot learning” technology which does not use videos with labels of abnormal scenes at all in machine learning of abnormal behavior. It attained the same accuracy as some supervised learning techniques in a short learning period of only 15 seconds. When determining abnormal behavior, pose estimation technology is used in combination with cutting-edge natural language processing technology, which makes comparisons by replacing behavior with sentences, to increase the estimation accuracy.
Recognizing abnormal behavior of humans is expected to be useful for preventing accidents and crimes. Previously, it was difficult to create teaching data and learning took a long time. Konica Minolta will solve these difficulties based on a new approach using zero-shot learning.

Using the image created by M. S. Ryoo, and J. K. Aggarwal. UT-Interaction Dataset, ICPR Contest on Semantic Description of Human Activities (SDHA). In ICPR Workshops, 2010.

Overview of the Research Themes Accepted

Title Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling, CVPR (2023).
arXiv URL:
Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features, CVPR (2023).
arXiv URL:
Researchers Taiki Sekii, Ryo Hachiuma, and Fumiaki Sato
FORXAI Business Operations, AI Technology Development Division, Konica Minolta, Inc.

【 FORXAI Imaging IoT Platform 】

FORXAI is an imaging IoT platform developed to accelerate DX by visualizing and solving issues at workplaces and manufacturing sites. It consists of Konica Minolta’s proprietary technologies and various IoT and AI technologies owned by partner companies. FORXAI enables partner companies to bring their assets together and create high-quality solutions, thereby improving working environments around the world and realizing a safe and secure society.

* FPS: Frames per second, a unit indicating the number of frames displayed per second in a video.

For More Information