Artificial vision in the common world: Artificial vision, also known as computer vision, is a notion that most people don’t get immediately. It’s a branch of artificial intelligence (AI) and it is about copying the operating process of the human vision system and adapting it to machines. You surely have already seen examples in science fiction movies e.g. the robots in Terminator or Sony NS-5 robots from the movie I Robot. Those machines use artificial vision to operate in their environment. However, you should b e aware that computer vision isn’t science fiction, it exists in the real world to a lesser extent. You’re often using it in your daily life without even knowing it. How do you think your smartphone unlocked itself when you bring it in front of your face? How does your front camera capture your face to add a filter to it?
Figure 1: NS-5 Robot from the movie I robot
Before describing and illustrating artificial vision, let’s start with the human vision system. . First of all, light emanates from a source. When it reaches an object, the light is partially absorbed and the rest is reflected. Those reflected light rays reach the observers’ eyes. In the eye, the cornea converges the light through aqueous humor to the eye lens. The eye lens converges light across the vitreous humor on the retina. The eye is a dioptric assembly which conveys light to the sensor. The human visual system’s sensor is called the retina. It’s made of several cell layers, cones ( used for diurnal and colored vision) and rod cells ( used for night and grayscale vision). These sensors’ cells capture a lot of information. The information is carried through the optical nerve to different areas of the brain cortex. Those areas use the signals to extract useful information and analyze the scene.
Figure 2:Drawing of an eye’s side view and the light rays’ path
In order t o see, you need organs that : - Focus/guide light to a sensor: the cornea and eye lens - Work as sensor on which the image of the observed scene will drop: the retina - Work as cables to spread the information coming from the sensor: the optical nerve - Are able to analyse images in order to extract information and make a decision on what to do next: the visual cortex and the encephalon.
Figure 3: Drawing of an eye’s side view and its composition
Eye fulfills several functions as explained above . The optical nerve carries signals captured by the eye to the brain. The visual cortex analyses the received signals and extracts the useful information, then it classifies those elements for us to assess our environment (object recognition, shapes, faces, colors etc..). All the analysis is done un consciously by the brain. This analysis is also the result of a lifelong learning Now let’s move to computer vision: Computer vision uses several devices to capture, transmit, treat and analyze information like the human vision system. In the machine realm, we are able to control
specific criteria. common vision systems are actually not as adaptable as human’s eyes and brain.
Firstly, we have to pick a sensors / camera based on several criteria:
- Depending the application, you will have: linear or matrixial sensors, size, number of pixels
- The smallest element you want to see.
- The type of power supply you want to use.
Figure 4: photo of a vision came
Secondly, you can choose the light source. It helps to enhance the element you want to detect. The type of source isn’t all you have to pick., I t is also very important to select the position of this light e.g. angles, distance, number. It will help you to maximize contrast between background and your subject. The most common positions are s:
the low angle light which enhances surface defects.
Figure 5: diagram of the low angle configuration
the front light which shows aspects and shapes
Figure 6: diagram of the front light configuration
Figure 7 : diagram of the back light configuration
Figure 8 : diagram of the back light configuration
You can choose a lens which fulfil some criteria:
- Sensor size
- Circle of full light of the lens
- Working distance (distance between inspected objects and sensor)
- Coating on lenses
- Wave length of light source
- Etc ..
The optical lens focuses the light, that is directed on the camera sensors
Figure 10: Image of a digital sensor
Images are transferred from sensors through electrical cables, optical fibers, or wireless mode, to the processing unit. This processing unit might be a computer or an electrical card like FPGA, a server or any other kind of image processing card. This processing unit performs image processing, it does the same job as the visual cortex. This image processing unit performs a pre- treatment and enhances the visibility of the useful t. Then it moves on the detection and processing phases to extract the features of the different elements in the image. Finally extracted features are analyzed to make a decision.
Figure 11 Screen shot of Terminator, with identification extracted from « the real augmented reality terminator vision »
On the image above, we see that the edges of the face are underlined which can be the result of pre- treatment on images. It detects the edges of objects to observe.
Figure 12 : Extracted underlined face d from previous image
The analysis and extraction steps can give those kinds of results:
- Distance between eyes = 5cm - Mouth surface = 3cm²
- Nose length = 4 cm
- Nose position = 15cm
After the data collection (from extracted features) and the processing steps, a decision is made. (classification or specific action). In this image the decision is shown as:
- Gender recognition : male / female
- Facial expression recognition: aggressive / happy / sad / angry / scared
Figure 13: Extraction from figure 11
Figure 14 : Diagram of acquisition and processing line of artificial vision
In, order to efficiently use computer vision, you need to know what you want to detect. Then you choose the best fitting components of your vision system, and the position and angles between the elements to increase the visibility of features you want to see. After acquiring the image, the system preprocessed (filtered) and analyzed it (segmented, extraction of features). The elements in the image are then classified in order to enable a decision. For us, human beings, this process happens as followed: “I see something, I acquire information: two eyes, four legs, height 70 to 80cm at shoulder. Mouth opens and closes quickly, hairs etc...” This is the feature extraction step. The b rain classifies the elements and recognizes a dog. “Do I need to run or should I come closer?” This is the decisional step. . Now, you have an overview of what computer vision is. It is used in way more fields than the entertainment apps only. In fact, a lot of industries use it to perform quality controls, to develop self- driving cars, to monitor public transports and so on. It is also used in the military to build weapons like the as Samsung automatic machine -gun SGR-A1. The field of computer vision still has a lot of secrets . We will dig into that later.