天气网站:https://www.tianqi.com/
北京海淀区 天气预报15天:https://www.tianqi.com/haidian/15/ 北京朝阳区 天气预报15天:https://www.tianqi.com/chaoyang/15/ 北京朝阳区 天气预报30天:https://www.tianqi.com/chaoyang/30/
https://www.tianqi.com/haidian/15/ https://www.tianqi.com/chaoyang/30/
点击:【天气】——>【全国】 获取行政区域:https://www.tianqi.com/chinacity.html
<div class=“citybox”>
<h2>:包含省份(provinces)<span>:省份下的城市(citys)<a>通过提取 <div class=“citybox”> 下的信息就可获取到全国行政区域
重庆渝中未来15日天气:https://www.tianqi.com/yuzhongqu/15/
<div class=“inleft”> :大类别
<ul class=“weaul”>:未来15日天气数据 – <li>:每一天的天气数据通过提取 <div class=“inleft”> 下的信息就可获取每个城市未来15日的天气状况
<div class=“citybox”>
<h2>:包含省份(provinces)<span>:省份下的城市(citys)<a>get_admArea(url) 函数
url:获取行政区域返回provinces_citys:——格式:list[string]:string如:‘ 北京,/beijing/,海淀,/haidian/ ’ ''' 功能:根据url获取行政区域 parameters: url:页面请求地址 return: provinces_citys:list[string]:【北京,/beijing/,海淀,/haidian/】 ''' def get_admArea(url): # 省份及其城市 provinces_citys = [] # 请求页面 soup = get_html_soup(url) cityBox = soup.find('div', class_='citybox') province_list = cityBox.find_all('h2') city_list = cityBox.find_all('span') # print(len(province_list))#31 # 提取数据 for i in range(len(province_list)): # 省份 province_name = province_list[i].a.get_text() # h2下的<a>标签下的内容 province_url = province_list[i].a['href'] # h2下的<a>标签下的href # print(province_name,province_url) # 城市 citys = city_list[i].find_all('a') for j in range(len(citys)): city_name = citys[j].get_text() city_url = citys[j]['href'] str = province_name + ',' + province_url + ',' + city_name + ',' + city_url # 以字符串形式存储 provinces_citys.append(str) # 返回省份及其城市 return provinces_citys<div class=“inleft”> :大类别
<ul class=“weaul”>:未来15日天气数据 – <li>:每一天的天气数据get_15_weather(provinces_city, year) 函数:
provinces_city:行政区域字符串string(格式:【北京,/beijing/,海淀,/haidian/】)year: 年份<class ‘str’>如:‘2020’返回city_weather_list 天气状况:string‘北京,/beijing/,海淀,/haidian/,20200702,小雨,20,27,优,东风,2’ ''' 功能:获取未来15天的天气数据 parameters: provinces_city:行政区域字符串string(格式:【北京,/beijing/,海淀,/haidian/】) year:年份<class 'str'>如:'2020' return: city_weather_list天气状况:string‘北京,/beijing/,海淀,/haidian/,20200702,小雨,20,27,优,东风,2’ ''' def get_15_weather(provinces_city, year): year_now = year # 年份<class 'str'> # https: // www.tianqi.com / yuzhongqu / 15 / city_weather_list = [] # 城市天气信息 tianqi_url = 'http://tianqi.com' # 历史天气 content_list = provinces_city.split(',') # 将每条行政区的信息转为list:‘北京’,’/beijing/‘,’海淀‘,’/haidian/‘ city_tianqi_15_url = tianqi_url + content_list[3] + '15' # 构建未来15天天气url soup = get_html_soup(city_tianqi_15_url) div_temp = soup.find('div', class_='inleft') if (div_temp != None): weather_list = soup.find('div', class_='inleft').find('ul', class_='weaul').find_all('li') else: return None # print(weather_list) for i in range(len(weather_list)): li = weather_list[i] # 提取时间:月日 str_temp = li.find_all('div')[0].get_text() time_m_d = ''.join(re.findall(r'\d+', str_temp)) # ['07', '01']——月日0701 time_y_m_d = year_now + time_m_d # 过渡时间(跨年)判断处理 if (time_m_d == '1231'): y = int(year_now) y += 1 year_now = str(y) # 提取天气状况 str_temp = li.find_all('div')[2].get_text() # 多云转小雨21~35℃ weather_station = ''.join(re.findall(r'[\u4e00-\u9fa5]+', str_temp)) # 多云转小雨 # 提取温度(最低、最高) temperature = re.findall(r'\d+', str_temp) temperature_min = temperature[0] temperature_max = temperature[1] # 提取空气质量 air_quality = li.find_all('div')[3].span.get_text() # 多云转小雨21~35℃ # 提取风向、风力等级 str_temp = li.find_all('div')[4].get_text() wind = ''.join(re.findall(r'^[\u4e00-\u9fa5]+', str_temp)) wind_scale = ''.join(re.findall(r'\d+', str_temp)) weather = provinces_city + ',' + time_y_m_d + ',' + weather_station + ',' + temperature_min + ',' + temperature_max + ',' + air_quality + ',' + wind + ',' + wind_scale city_weather_list.append(weather) return city_weather_list