用bs4和csv爬取豆瓣电影top250

    技术2024-07-25  10

    import requests # 引用requests库 from bs4 import BeautifulSoup import csv # 引用BeautifulSoup库 url = 'https://movie.douban.com/top250' headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'} csv_file = open('doubantop250.csv','w',newline='',encoding='utf-8') writer = csv.writer(csv_file) for i in range(0,11): params = { 'start':str(i*25), 'filter':'' } res = requests.get(url,headers=headers,params=params) bs_movie = BeautifulSoup(res.text,'html.parser') list_movies = bs_movie.find('ol',class_='grid_view').find_all('li') for movie in list_movies: ans = movie.find('div',class_='hd') ans_linker = ans.find('a')['href'] print(ans_linker) writer.writerow([ans_linker]) ans_name = movie.find(class_='title') print(ans_name.text) writer.writerow([ans_name.text]) ans_bd = movie.find('div',class_='bd') ans_star = ans_bd.find(class_='rating_num').text print(ans_star) writer.writerow([ans_star]) try: ans_quote = ans_bd.find(class_='quote').find(class_='inq').text writer.writerow([ans_quote]) except AttributeError: writer.writerow(['此电影无概述'])

    结果部分展示:

    https://movie.douban.com/subject/1292052/ 肖申克的救赎 9.7 希望让人自由。 https://movie.douban.com/subject/1291546/ 霸王别姬 9.6 风华绝代。 https://movie.douban.com/subject/1292720/ 阿甘正传 9.5 一部美国近现代史。 https://movie.douban.com/subject/1295644/ 这个杀手不太冷 9.4 怪蜀黍和小萝莉不得不说的故事。 https://movie.douban.com/subject/1292063/ 美丽人生 9.5 最美的谎言。 https://movie.douban.com/subject/1292722/ 泰坦尼克号 9.4
    Processed: 0.012, SQL: 9