1.创建HttpClient实例 HttpClient的重要功能是执行HTTP请求方法,获取响应资源。在执行具体的请求方法之前,需要实例化HttpClient。 实例化HttpClient的方式主要有以下5种。
HttpClient httpClient = Httpclients.custom().build();HttpClient httpClient = Httpclientbuilder.create().build();HttpClient httpClient = Httpclients.createSystem();HttpClient httpClient = Httpclients.createMinimal();CloseableHttpClient httpClient = Httpclients.createDefault();2.创建请求方法的实例 在HttpClient中,支持HTTP/1.1的HTTP方法,即GET、POST、HEAD、PUT、DELETE、OPTIONS和TRACE。其中,每种方法都对应一个类,即HttpGet、HttpPost、HttpHead、HttpPut、HttpDelete、HttpOption和HttpTrace。在网络爬虫中,常用的类是HttpGet与HttpPost。从HttpClient源码中,可以发现这些类的实例化方式各有三种,三种实例化使用方式如下面代码所示。
//第一种方式 String personalUrl = "https://searchcustomerexperience.techtarget.com/info/news"; URI uri = new URIBuilder(personalUrl).build(); HttpGet getMethod = new HttpGet(); getMethod.setURI(uri); System.out.println(getMethod); //第二种方式 HttpGet httpGetUri = new HttpGet(uri); System.out.println(httpGetUri); //第三种方式 HttpGet httpGetStr = new HttpGet(personalUrl); System.out.println(httpGetStr);3.执行请求 基于实例化的HttpClient,可以调用execute(HttpUriRequest request)方法来执行请求,返回HttpResponse。HttpClient也提供了三种操作方式,代码示例如下。
//第一种方式 HttpResponse httpResponse = new BasicHttpResponse(HttpVersion.HTTP_1_1, HttpStatus.SC_OK, "OK"); httpResponse = client.execute(getMethod); //第二种方式 HttpResponse httpResponse = null; try { httpResponse = httpClient.execute(httpGet,localContext); } catch (IOException e) { e.printStackTrace(); } //第三种方式 CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet("https://searchcustomerexperience.techtarget.com/info/news"); CloseableHttpResponse httpResponse = null; try { httpResponse = httpClient.execute(httpGet); } catch (IOException e) { e.printStackTrace(); }4.获取响应信息 基于上述方法3获取的HttpResponse,可以继续执行一些方法获取响应状态码、响应头和响应实体等信息,如程序3-14所示,在执行请求时,使用了HttpContext,即HTTP上下文环境。
//程序3-14 public class HttpclientInit { public static void main(String[] args) throws Exception { //初始化HttpContext HttpContext localContext = new BasicHttpContext(); String url = "https://searchcustomerexperience.techtarget.com/info/news"; //初始化httpClient HttpClient httpClient = HttpClients.custom().build(); HttpGet httpGet = new HttpGet(url); //执行请求获取HttpResponse HttpResponse httpResponse = null; try { httpResponse = httpClient.execute(httpGet,localContext); } catch (IOException e) { e.printStackTrace(); } //获取具体响应信息 System.out.println("response:" + httpResponse ); //响应状态 String status = httpResponse .getStatusLine().toString(); System.out.println("status:" + status); //获取响应状态码 int statusCode = httpResponse .getStatusLine().getStatusCode(); System.out.println("statusCode:" + statusCode); //协议的版本号 ProtocolVersion protocolVersion = httpResponse .getProtocolVersion(); System.out.println("protocolVersion:" + protocolVersion); //是否是ok String phrase = httpResponse .getStatusLine().getReasonPhrase(); System.out.println("phrase:" + phrase); //头信息 Header [] headers = httpResponse.getAllHeaders(); System.out.println("输出头信息为:"); for (int i = 0; i < headers.length; i++) { System.out.println(headers[i]); } System.out.println("头信息输出结束"); if(statusCode == HttpStatus.SC_OK){//状态码200表示响应成功 //获取实体内容 HttpEntity entity = httpResponse.getEntity(); //注意设置编码 String entityString = EntityUtils.toString (entity,"gbk"); //输出实体内容 System.out.println(entityString); EntityUtils.consume(httpResponse.getEntity()); }else { //关闭HttpEntity的流实体 EntityUtils.consume(httpResponse.getEntity()); } } }