There are a lot of APIs available by google with each of them having their application in various fields which makes the work easier while the development of mobile applications, website development and many more. One such API is the YouTube Data API v3 by google. It provides features including:
Search for videos,
Retrieve information of videos from youtube either of channel or of particular video i.e. likes/dislikes, comments, etc.
Can start the youtube video directly from the application.
As described above there are other applications of youtube API. Now you're going to use one such application of Youtube API in the field of data science using Python.
Google Account
Python 3
Anaconda or GoogleColab: If you want to use your local machine then install Anaconda on the local machine and start Jupyter Notebook there. Or else you can use Google Colab just to save your time in installation and memory usage on a local machine as it runs on the cloud and provides GPU.
Login to Google, if you don't have an account create one and then login.
Visit the Google developer dashboard, create a new project from the top of the page as in below image, click “Select a project” from the top :
Google developer dashboard
Now Click on “New Project” as in the below one and proceed.
Select a project window
Once done you will be automatically redirected to the Google APIs dashboard.
The next step is to activation of YouTube API, so for that navigate to the API & Services from the side panel.
Then click on Enable API & Services from the top of the page.
Search for Youtube and then select YouTube Data API v3.
Select YouTube Data API v3 page
After that Enable the API by clicking on the Enable as shown in the below figure.
API library that Enables the API
Now again click on the API & Services and select credentials. Navigate to the Create Credentials from the top of the page in that select API key.
Credentials Page
Once clicked after some time a pop up will come with the message API key created from there you will get our API key as alphanumeric. Copy that and keep it safe for further use.
API key created
Google collab provides all the features as provided by the Jupyter notebook on local machines. To store the data like datasets, images, videos to work on, it will be stored in the google drive the same way stored in a local machine.
Apart from that it also provides a free GPU with 12 Gb of RAM, but only supports Python 2.7 and Python 3.6.
After you run the below code in the cell it will mount the google drive with google collab, just after entering the authorization code got from the URL in the output.
Code illustration: 1
After this step, you can access any of the folder/files in the google drive from the google collab. Now just create a new folder in google drive for this project and run the below cell.
Code illustration: 2
In the google collab, all the libraries are pre-built but while using in jupyter notebook or any other editor you have to install using pip command.
You will be using google api python client library to access youtube data and pandas library to apply exploratory data analysis on the extracted data.
For Windows:
To install these in windows run the below commands in the command prompt.
Code illustration: 3
For Anaconda:
Code illustration: 4
Now it's time to start with the coding part to get insights from youtube data. So the first step towards this is to import the required libraries as mentioned above i.e. pandas, google client library,
Code illustration: 5
As explained above, you have generated the YouTube API key from the google console page. Now it's time to use that API, so here set some parameters to be used in future steps including “youtube API key”, version of API.
Code illustration: 6
“YouTube Data API” provides so many functions to retrieve all kinds of data from the youtube of particular channels, videos or playlists, and many more. There are many resources available with this API to retrieve.
Apart from this, there are some other functionalities supported by this API to insert, update or delete on youtube, but for that, there is need of authorization while generating API. For now, you ‘re going to retrieve function only and for that, there is no need for authorization while generating API.
Some of the resources available to retrieve which are used in the following steps are:
Search: It will be used to find the information about the channel by providing the “channel name” as search parameters and retrieve the channel, which will be useful for retrieving statistics, uploaded videos.
Channel: It contains information about youtube channels including total subscribers of the channel, total uploaded videos, total likes/dislikes, comments on all videos, and other information.
Run the below cell to perform a youtube search by API calling and will save the data in the list.
You need to use snippets property for youtube search as it contains the basic information of the channel.
Code illustration: 7
As from the output of the above cell, you can see the details of the all channels associated with the provided one. The output of search results stored in a list that will become a dictionary after execution.
All the basic information related to all the channels will be retrieved after the execution of the following code:
Code illustration: 8
As seen from the output, it can be observed that there are four other channels which are associated with the given one, and including the other information for that channel like published date, channel ID, channel title, description, thumbnails, published time, etc. has been retrieved, all this information stored in the snippet.
Next, you will find the channel ID of the first in the list from the details above,
Code illustration: 9
Now find the statistics of the channel by using channel ID where you get all the details of the total subscribers, views, videos uploaded, likes/dislikes, etc. of the channel.
For this step, you have to use of the channel resource by YouTube Datav3 API through passing channel IDS as a parameter.
Code illustration: 10
In this step, you will be using this property for channel resources to retrieve the information for the content of the channel, including uploads- playlists, i.e. playlist id of uploads with which you can find all the uploaded videos on the channel since the creation of it.
Code illustration: 11
Now, this is another resource that is used to retrieve the details of the playlist uploads including all the uploaded videos. But also possess some limitations, i.e. at the one time, you can only get results of 50 videos maximum to get all the videos in a single run, and also using the next page token parameter that will be useful to retrieve the details of the next page.
So here, extracting all the videos with their details and saving it in a list as in below code,
Code illustration: 12
From the below cell, you can see the data gets retrieved from the playlists as it contains the video ID and title.
Code illustration: 13
Next, retrieve the video IDs, and in the next cell, the statistics of all the videos, including total likes/dislikes, comments, views on the video is presented.
Code illustration: 14
Now, retrieving the content details of all the videos and will store them in lists after then save it to disk as a csv file.
Code illustration: 15
Code illustration: 16
Till now, you have extracted the information of all the videos and saved them in one csv file using pandas. Now it's time to get some more insights into the dataset.
For that you can make use of a python library -pandas, which provide various functions to get insights into the dataset like,
Total number of videos,
Getting counts of unique values, i.e. using value_counts() method,
Most liked the video, disliked video, most commented video,
Most viewed the video,
Video with the maximum number of comments, likes, dislikes,
Maximum Number of likes, dislikes, comments.
So at first read that csv file from the disk,
Code illustration: 17
Counts of unique values,
Code illustration: 18
In the same way, unique values of comments and dislikes can be found.
Code illustration: 19
Code illustration: 20
From the above cell, the description of the video shows the maximum number of likes, dislikes, comments, and views on the videos.
Using “value_counts” you can also get the number of videos corresponding to the likes, dislikes, comments, and views.
Now it's time to get to know about the videos with the maximum number of likes, dislikes, comments, views. For that, first, you have to find the index of videos corresponding to the most liked, commented, disliked, and viewed the video,
Code illustration: 21
After this, you can access the video information, i.e. video title, video Id, URL likes, comments, etc. using an index.
Most liked video is;
Code illustration: 22
Most Viewed Video is;
Code illustration: 23
Most Commented video is;
Code illustration: 24
For the complete code go through this Github repository. After this, there can be some more insights that can also be retrieved i.e. sentiment analysis of the comments on videos can be done.
YouTube Data API is used to extract the information from the youtube channel using Python. Information includes the details corresponding to each video uploaded to that channel, i.e. channelId, number of videos, upload Id, the maximum number of likes, comments, views, total subscribers of the channel, published date, published time of the channel and videos as well.
All these steps will be followed by the generation of youtube API from the google console. After this analysis of the data extracted from the youtube, which is in csv format, will be done to get some interesting insights from the datasets as “which video is the most liked video”, “video with maximum comments”, “video that is most viewed”.
5 Factors Influencing Consumer Behavior
READ MOREElasticity of Demand and its Types
READ MOREAn Overview of Descriptive Analysis
READ MOREWhat is PESTLE Analysis? Everything you need to know about it
READ MOREWhat is Managerial Economics? Definition, Types, Nature, Principles, and Scope
READ MORE5 Factors Affecting the Price Elasticity of Demand (PED)
READ MORE6 Major Branches of Artificial Intelligence (AI)
READ MOREScope of Managerial Economics
READ MOREDijkstra’s Algorithm: The Shortest Path Algorithm
READ MOREDifferent Types of Research Methods
READ MORE
Latest Comments
chiau8526g
Nov 13, 2020Hi Ripul,
chiau8526g
Nov 13, 2020Thanks your sharing that I can connect youtube API by myself. But I have a problem on Code illustration: 15, colab showed
chiau8526g
Nov 13, 2020" File "<ipython-input-41-c5c8e3f0769c>", line 8 disliked.append(int((stats[i])['statistics']['dislikeCount']) ^ SyntaxError: invalid syntax"
chiau8526g
Nov 13, 2020And I don't know why? Could you help me? Thank you very much!
Ripul Agrawal
Dec 08, 2020Hey Chiau, I am sorry for the delay. But I have dropped you a mail for the same around 10 days back. So let me know if you are still struck with that error. Thanks !
ParthRangarajan
Dec 16, 2020Hey please can you help with this too. I am getting the same error.
Ripul Agrawal
Dec 18, 2020reach me out at ripulagrawal98@gmail.com
Ripul Agrawal
Dec 09, 2020For any queries, please reach out to me at ripulagrawal98@gmail.com
utkarsh.singh
Dec 30, 2020Your code doesn't work for any channels containing more than 20k videos. 20k is the max it can return.
ahnafzahin06
Oct 19, 2021I have a problem with "Code illustration: 12". When I am trying to do this on google colab it is showing "TypeError: 'method' object is not subscriptable" on "allVideos += res['items'] " line. How can I fix it?
yakisatama
Mar 02, 2022Hi.. All image on this crashed. Can you fix that?