JAVA爬虫程序中JSoup全局代理的使用分享-yiniuyun的专栏

JAVA爬虫程序中JSoup全局代理的使用分享

2020-07-27 17:27:12栏目：python爬虫 IP属地：IP未知

作为网络爬虫数据的采集是只要的来源，随着互联网大数据的迅速发展，网络爬虫也是需要一直提升技术来适应全面更新的要求，而稳定的代理ip肯定是爬虫的前提，稳定极速的代理ip能够支撑爬虫的工作效率和工作进行的稳定。以下分享JAVA爬虫程序中JSoup全局代理使用方式，代码比较简单，分享出来希望可以供有需求的同学理解下。

import org.jsoup.nodes.Document;

public class Demo {

public static void main(String[] args) {

try{

// 代理服务器(产品官网 www.16yun.cn)

final static String ProxyHost = "t.16yun.cn";

final static String ProxyPort = "31111";

System.setProperty("http.proxyHost", ProxyHost);

System.setProperty("https.proxyHost", ProxyHost);

System.setProperty("http.proxyPort", ProxyPort);

System.setProperty("https.proxyPort", ProxyPort);

// 代理验证信息

final static String ProxyUser = "username";

final static String ProxyPass = "password";

System.setProperty("http.proxyUser", ProxyUser);

System.setProperty("http.proxyPassword", ProxyPass);

System.setProperty("https.proxyUser", ProxyUser);

System.setProperty("https.proxyPassword", ProxyPass);

// 设置IP切换头

final static String ProxyHeadKey = "Proxy-Tunnel";

// 设置Proxy-Tunnel

Random random = new Random();

int tunnel = random.nextInt(10000);

String ProxyHeadVal = String.valueOf(tunnel);

// 处理异常、其他参数

Document doc = Jsoup.connect(url).timeout(3000).header(ProxyHeadKey, ProxyHeadVal).get();

if(doc != null) {

System.out.println(doc.body().html());

}

}catch (IOException e)

{

e.printStackTrace();

}

}.out.println(doc.body().html()); } }catch (IOException e) { e.printStackTrace(); } }}