• Category
  • >Deep Learning
  • >Machine Learning

What is Computer Vision and How does it Work?

  • Ayush Singh Rawat
  • Apr 15, 2021
What is Computer Vision and How does it Work? title banner

You might have come across google lens, snapchat filters or the game Pokemon Go, and you might have wondered what makes these applications so intuitive and smart. Well this is due to the impact of Computer vision.


It allows computers to understand human responses well and react accordingly. The searching through google lens which allows you to search a photo to find more details about the product, is a classic example of computer vision’s working.



Introduction to Computer vision


Computer vision is the field of computer science that focuses on applying human vision complexity systems and enabling computers to identify and process objects in images and videos in the same manner that a human brain does. 


Computer Vision, often abbreviated as CV, is defined as a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos.


Until recently, computer vision only worked in limited capacity but the current scenarios in artificial intelligence and in deep learning, the field has been able to take great leaps in recent years and has been able to surpass humans in some tasks related to detecting and labeling objects. 


It is a multidisciplinary field that could broadly be called a subfield of artificial intelligence and machine learning, which may involve the use of specialized methods and make use of general machine learning algorithms.



Evolution of Computer Vision


Early experiments in computer vision happened withinside the 1950s, the use of a number of the primary neural networks to stumble on the rims of an item and to type easy items into classes like circles and squares. 


In the 1970s, the primary industrial use of computer vision interpreted typed or handwritten textual content and the use of optical character recognition. This development turned into use to interpret written textual content for the blind. 


As the net matured during the 1990s, making huge units of photos to be had on-line for analysis, facial popularity applications flourished. These developing information units helped make it feasible for machines to perceive distinct humans in photographs and videos.


The consequences of these advances at the computer vision discipline were astounding. Accuracy for item identity and type have long gone from 50 percent to ninety nine percent in much less than a decade — and today’s structures are better than human beings at quickly detecting and reacting to visible inputs.



Working of Computer Vision


Deep learning algorithms and its implementation have profoundly converted computer vision, in relation with different branches of artificial intelligence, to such a quantity that for plenty of responsibilities its use is taken into consideration. 


In particular, Convolutional Neural Networks (CNN) have performed past today's outcomes making use of conventional computer vision.


These 4 steps define a fashionable method to constructing a computer vision version of  the use of CNNs:


  1. Create a dataset from annotated photos or use a current one. Annotations may be the photo category (for a classification problem); pairs of bounding containers and classes (for an object detection problem); or a pixel-clever segmentation of every item of hobby found in an photo (for an example segmentation problem).

  2. Extract, from every photo, capabilities pertinent to the project at hand. This is a key factor in modeling the problem. For example, the capabilities used to understand faces, capabilities primarily based totally on facial criteria, are glaringly now no longer similar to the ones used to understand human organs.

  3. Train a deep learning primarily based on the capabilities isolated. Training approach feeds the system many photos and it's going to learn, primarily based on the ones capabilities, which is the way to resolve the project at hand.

  4. Evaluate the use of photos that weren’t used within the starting phase. By doing so, the accuracy of the starting version may be tested.


This method may be very simple however it serves the motive well. Such a method, referred to as a supervised system of learning, calls for a dataset that encompasses the phenomenon which has to be learned.


(Most related: Computer Vision Applications)



6 Practical Examples of Computer Vision


Computer vision is a compilation of diverse tasks which are combined to run highly useful features in applications. Image and video recognition are two of the most worked tasks in computer vision which basically help in determining the different objects in an image.


  1. Image classification


Under this we classify what broad category of object is in this photograph. It is one of the most renowned tasks in computer vision is image classification. It allows for the classification of a given image to take place and comparing it with the sets of predefined categories. 


For eg., we want to classify images with the condition that they contain a tourist attraction or not. Suppose that a classifier is built for this situation and that the image given below is under scrutiny.

The Eiffel Tower

The Eiffel Tower, Image Source

The algorithm beneath the classifier will acknowledge that this image is similar to the group of images which are tourist spots. This necessarily does not mean it has recognized the Eiffel Tower but rather that it has encountered such photos of the tower before and it has been told that those images contain a tourist attraction spot.



  1. Localization


Suppose now that we are willing to know the exact tourist spot that appears in an image song with its location. The primary objective of localization is to locate the scene in an image. For eg., the Eiffel Tower has been localized below.

The Eiffel Tower enclosed by a bounding box

Source: Tryolabs.com

Localization is done by enclosing the object with a box in the image.

For instance, It allows for the automatic cropping of objects in a set of images. If this task is combined with the classification task, it could easily build a dataset of (cropped) images of famous tourist attractions spots.



  1. Object detection


This is how we determine the number of objects in a photo. If we imagine that the simultaneous positioning and classification operations are repeated for all the objects of interest in the image, the object will eventually be found, in this case a series of objects. 


The content it contains is unknown. Therefore, the purpose of object recognition is to find a variable number of objects in an image and then classify them.

Object detection results

Source: Tryolabs.com

  • In this specially dense image, we see how computer vision identifies a huge quantity of various gadgets: cars, human beings, bicycles.

  • The hassle could be complicated even for a human. Some gadgets are simplest in part visible, both due to the fact they’re in part outdoors the body or due to the fact they're overlapping each other. Also, the scale of comparable gadgets range greatly. 

  • One of the best utilities of item detection is counting. The packages in actual lifestyles are pretty diverse, from counting extraordinary varieties of fruit harvested to counting human beings at activities such as public demonstrations or cricket matches.



  1. Object identification


Which kind of a given item is in this photograph? Object identification is barely specific from item detection, even though comparable strategies are regularly used to obtain them both. In this case, given a particular item, the intention is to locate times of stated item in snapshots. 


It isn't always approximately classifying a photo, as we noticed previously, however approximately figuring out if the item traceable in a photo or not, and if it does appear, specifying the location(s) wherein it is traceable. 


An instance can be looking for snap shots that comprise the brand of a particular company. Another instance is tracking actual time snapshots from safety cameras to become aware of a particular person’s face.



  1. Instance segmentation


What pixels belong to the item withinside the picture? Instance segmentation may be visible as a subsequent step after item detection. In this case, it’s now no longer handiest to locate gadgets in a picture, however additionally developing a mask for every detected item is correct.

Instance segmentation results

Source: Tryolabs.com


You can see in the picture above how the example segmentation set of rules reveals masks for the 4 Beatles and a few cars (despite the fact that the end result is incomplete, specifically in which it involves Lennon).


Such effects could be very high priced if the duties had been finished manually, however the era makes it smooth to gain them. In France, the regulation prohibits exposing kids within the media without the specific consent of their parents. 


Using example segmentation techniques, it’s viable to blur out younger kids’s faces on tv or in film, whilst they're interviewed or captured outdoors, as can be the case in the course of pupil strikes.


  1. Object monitoring


The motive of item monitoring is to track an item which is in movement over time, using consecutive video frames because the input. This capability is critical for robots which are tasked with the whole lot from scoring goals to preventing a ball, which is the case of goalkeeper robots. It is similarly essential for autonomous cars to permit for high-degree spatial reasoning and path planning. 


Similarly, it's beneficial in numerous human monitoring systems, from the ones which try to recognize purchaser behavior, as we noticed in the case of retail, to the ones which continuously reveal cricket or basketball gamers in the course of a game.


  • An exceptionally trustworthy manner to carry out item monitoring is to use item detection for every photo in a video collection after which examine the times of every item to decide how they moved. 

  • The downside of this technique is that item detection for every distinct photo is commonly high priced. An opportunity could be to seize the item being tracked only once (as a rule, the primary time it appears) after which determine the actions of that item without explicitly spotting it from the following images. 

  • Finally, an item monitoring technique does now no longer always want to be able to detect objects; it may clearly be based on movement criteria, without being conscious that the item is being tracked.



Advantages of Computer Vision


Public and private sectors both are affected immensely by the judicious use of computer vision and its techniques. Some of the advantages are stated below:


  • Improved searching methods


Traditional methods of advertising have heavily relied on tags and keywords. If you are looking for a t-shirt, the keywords are given such as “t-shirt”, “black”, “cotton” to narrow the search and provide better results to customers.


This method was reliable but not very efficient. So, computer vision was introduced to this sector and instantly it reaped results as it has helped people in getting accurate results for products they were searching for.


Computer Vision does not rely on traditional tags instead it compares the actual physical characteristics of the particular image. This feature actually allows people to search using a photo to find similar products. 



  • Intuitive customer experience


Apps like snapchat and services like animoji have taken the user experience up by a notch. The main focus is how entertaining, easy and engaging these experiences are. This has been only possible due to the facial mapping and augmentation features that are only possible due to next-level computer vision.



  • Product discovery


Imagine having to just search a photo and the whole world helps you in finding that commodity you are looking for. Computer vision allows even the real world to connect with you and help in buying stuff from a simple listings tally in google or even when you are looking for a cafe nearby.



  • Payment hassles resolved


This has been a big win for the computer as it has payment easy and stress-free. No more worrying about bills and no more stress in case you forget your wallet at home. These features just enhance the customer’s store experience creating positive feedback subconsciously.



  • Introduction of Augmented reality


Augmented reality is mixing our real world with the useful features of the internet to better the user experience and save time and resources. For eg. Google translate uses AR as it translates the text in real-time. 


There are many more avenues that are being tested out and many more life-changing features that will truly mark the beginning of the future.





The whole overview on computer vision concludes that it not only makes the computer think like a human brain but also allows humans to rely on computers to make a safe decision on their own. 


(Related blog: Computer Vision to redefine surveillance)


With the coming years, it is only going to get bigger and smarter. The new technologies will surely improve the intuitiveness to become more engaging with our schedules.

Latest Comments

  • clbswift

    Jun 24, 2021

    The ultimate goal of computer vision based ergonomic risk assessment is to implement a work program on a laptop or mobile phone. You can find computer vision tutorials here: https://axisbits.com/blog/computer-vision-consulting-reliable-way-achieve-more-market