In the first part of the story, I collected statistical data from about 3000 YouTube channels and got some interesting insights. In this part, I will go a bit deeper, from the generic “channel” to the individual “video” level. I will show how to collect data about YouTube videos and what kind of insights we can get.
Methodology
To collect data about YouTube videos, we need to perform several steps:
- Get credentials for the YouTube Data API. It’s free, and the API limit of 10,000 requests per day is enough for our task.
- Find several YouTube channels that we want to analyze.
- Write some Python code to get the latest videos and their stats for a selected channel. YouTube analytics is available only for channel owners, and we can only get data at the current moment. But we can run the code for some time. In my case, I collected data for three weeks using Apache Airflow and a Raspberry Pi.
- Perform the data analysis. I will be using Pandas, Matplotlib, and Seaborn for that.
Getting the YouTube API credentials and configuring the Apache AirFlow were described in my previous articles, and I recommend readers pause this one and read that part first:
And now, let’s get started.
1. Getting the data
To get information about YouTube videos, I will use a python-youtube library. Surprisingly, there is no ready-to-use method to get the list of videos from a specific channel, and we need to implement it on our own.
First, we need to call the get_channel_info
method, which, as its name suggests, will return us the basic information about the channel.
from pyyoutube import Apidef get_channel_info(api: Api, channel_id: str)…