Computer vision is an interdisciplinary field that empowers computers to "understand" digital images and videos, essentially automating tasks performed by the human visual system. Its core purpose is to transform visual input into meaningful descriptions of the world, enabling intelligent thought processes and appropriate actions.
Originating in the late 1960s from early artificial intelligence research, computer vision initially aimed to mimic human sight for robotic intelligence, with a 1966 belief that a simple summer project could make a computer "describe what it saw." However, it quickly evolved beyond basic image processing, focusing on extracting complex three-dimensional structure for full scene understanding.
The 1970s saw the development of foundational algorithms for tasks like edge extraction and object representation, followed by more rigorous mathematical analyses in the 1980s that introduced concepts like scale-space and contour models. Today, computer vision leverages geometry, physics, statistics, and machine learning to achieve sophisticated tasks such as object detection, video tracking, and 3D scene modeling.