Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

pandas - youtube_dl video descriptions

I have a df containing a set of videoIDs from YT:

import pandas as pd

data = {'Order':  ['1', '2', '3'],
        'VideoID': ['jxwHmAoKte4', 'LsXM502SpiU','1I3f27iQ4pM']
        }

df = pd.DataFrame (data, columns = ['Order','VideoID'])

print (df) 

and want to download the video descriptions and save them in the same df in an extra column.

I tried to use youtube_dl in Jupyter this way:

import youtube_dl

def all_descriptions(URL):
    videoID=df['VideoId']
    URL = 'https://www.youtube.com/watch?v=' + videoID
    ydl_opts = {
    'forcedescription':True,
    'skip_download': True,
    'youtube-skip-dash-manifest': True,
    'no_warnings': True,
    'ignoreerrors': True
    }
   
    try:
        youtube_dl.YoutubeDL(ydl_opts).download(URL)
        return webpage

except:
    pass

df['descriptions']=all_descriptions(URL)

I see the output of the code as text, but in df only "None" as text of the column.

Obviously I can't transport the output of the function to df in the proper way.

Can you suggest how to get it right?

Thank you in advance for help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

@perl I modify the df to include two URLs that are causing two types of error:

import pandas as pd

data = {'Order':  ['1', '2', '3', '4', '5'],
        'VideoId': ['jxwHmAoKte4', 'LsXM502SpiU','1I3f27iQ4pM', 'MGQOX2rK5s', 'wNayw_E7lIA']
        }

df = pd.DataFrame (data, columns = ['Order','VideoId'])

print (df)

Then I test it in the way you suggested, including my definition of ydl_opts:

videoID=df['VideoId']
URL = 'https://www.youtube.com/watch?v=' + videoID
ydl_opts = {
    'forcedescription':True,
    'skip_download': True,
    'youtube-skip-dash-manifest': True,
    'no_warnings': True,
    'ignoreerrors': True
    }
    
df['description'] = [
    youtube_dl.YoutubeDL(ydl_opts).extract_info(
        u, download=False)['description'] for u in URL]

df

Reaching to the first error I get the output:

TypeError: 'NoneType' object is not subscriptable

After that I replace 'forcedescription' in my code with 'extract_info':

def all_descriptions(URL):
    videoID=df['VideoId']
    URL = 'https://www.youtube.com/watch?v=' + videoID
    ydl_opts = {
    'forcedescription':True,
    'skip_download': True,
    'youtube-skip-dash-manifest': True,
    'no_warnings': True,
    'ignoreerrors': True
    }
   
    try:
        youtube_dl.YoutubeDL(ydl_opts).download(URL)
        return webpage
    
    except:
        pass

It skips all errors, but as the result there is nothing in the 'description'-column.

Any sugggestions?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...