前面的例子为大家介绍爬虫技术,我们发现都是单线程例子,获取的时间相对较慢。那么有没有一种更快速的处理办法呢?这里为大家介绍ThreadPool中map方法
导入核心import包
from multiprocessing
.dummy
import Pool
as ThreadPool
封装请求方法
def getSource(url
):
return requests
.request
(method
='GET', url
=url
, headers
=headers
, timeout
=10, proxies
={'http': '123.55.106.175:9999'})
将需要爬的url方到数组中
urls
= []
for div
in _divs
:
href
= div
.xpath
('h4/a/@href')
urls
.append
(href
[0])
初始化线程数
pool
= ThreadPool
(4)
pool
.map(getSource
, urls
)
pool
.close
()
pool
.join
()
转载请注明原文地址:https://ipadbbs.8miu.com/read-26553.html