HTTP代理使用样例¶
本文档包含编程请求http代理服务器的代码样例,供开发者参考。
样例使用说明
- 代码样例不能直接运行,因为代码中的订单号1834987042xxxxxx、代理IP和端口号117.69.63.102:43787、
用户名username、密码password都是虚构的,请替换成您自己的信息。 - 代码样例正常运行所需的运行环境和注意事项在样例末尾均有说明,使用前请仔细阅读。
- 使用代码样例过程中遇到问题请联系售后客服,我们会为您提供技术支持。
注意事项
以下样例均为基础案例,运行是并不能保证能成功爬取到目标网站,目标网站通常具有反爬重机制,如跳转时需要输入验证码的页面
建议您在开发过程中基于基础样例进行如下改进
- 添加IP池管理
- 合理控制对目标网站的请求频率,建议对同一网站1个代理IP每秒请求不超过1次;
- 发出的http请求尽可能带上完整的header信息。
Python3¶
request¶
request (推荐)
使用提示
- 基于requests的代码样例支持访问http,https网页,推荐使用
- requests不是python原生库,需要安装才能使用:
pip install requests
"""
使用requests请求代理服务器
请求http和https网页均适用
"""
import requests
# 提取代理API接口,获取1个代理IP
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006"
# 获取API接口返回的代理IP
proxy_ip = requests.get(api_url).text
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": proxy_ip},
# "https": "http://%(proxy)s/" % {"proxy": proxy_ip},
# }
# 要访问的目标网页
target_url = "https://www.juliangip.com/api/general/Test"
# 使用代理IP发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print(response.text)
UrlLib¶
UrlLib
使用提示
- 基于UrlLib的代码样例同时支持访问http,https网页
- 运行环境要求Python3.x
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用urllib请求代理服务器
请求http和https网页均适用
"""
import urllib.request
import ssl
# 全局取消证书验证,避免访问https网页报错
ssl._create_default_https_context = ssl._create_unverified_context
# 提取代理API接口,获取1个代理IP
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006"
# 获取API接口返回的IP
proxy_ip = urllib.request.urlopen(api_url).read().decode('utf-8')
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": proxy_ip},
# }
# 要访问的目标网页
target_url = "https://www.juliangip.com/api/general/Test"
# 使用代理IP发送请求
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
response = urllib.request.urlopen(target_url)
# 获取页面内容
if response.code == 200:
print(response.read().decode('utf-8'))
AioHttp¶
AioHttp
使用提示
- 基于aiohttp的代码样例支持访问http,https网页
- aiohttp不是python原生库,需要安装才能使用: pip install aiohttp
- aiohttp只支持Python3.5及以上
- Windows系统Python3.8使用aiohttp访问https网站会抛出异常,在import asyncio后调用 asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())即可解决。
aiohttp
抛出ssl
错误,在aiohttp
包的文件中找到connector.py
这个文件(在aiohttp
的根目录下),将runtime_has_start_tls = self._loop_supports_start_tls()
这一行注释掉,然后在其下面新增一行runtime_has_start_tls = False if req.proxy.scheme != "https" else self._loop_supports_start_tls()
即可
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用aiohttp请求代理服务器
请求http和https网页均适用
"""
import random
import asyncio
# asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) windows系统请求https网站报错时调用此方法
import aiohttp
import requests
page_url = "https://www.juliangip.com/api/general/Test" # 要访问的目标网页
# API接口,返回格式为json
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006" # API接口
# API接口返回的proxy_list
proxy_list = requests.get(api_url).text()
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxy_auth = aiohttp.BasicAuth(username, password)
async def fetch(url):
async with aiohttp.ClientSession() as session:
async with session.get(url, proxy="http://" + random.choice(proxy_list), proxy_auth=proxy_auth) as resp:
content = await resp.read()
print(f"status_code: {resp.status}, content: {content}")
def run():
loop = asyncio.get_event_loop()
# 异步发出5次请求
tasks = [fetch(page_url) for _ in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
if __name__ == '__main__':
run()
httpx¶
httpx
使用提示
- 基于httpx的代码样例支持访问http,https网页
- httpx不是python原生库,需要安装才能使用:
pip install httpx
- httpx运行环境要求 Python3.8+
- httpx暂时还不支持SOCKS代理
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求代理服务器
请求http和https网页均适用
"""
import random
import asyncio
import httpx
import requests
page_url = "https://www.juliangip.com/api/general/Test" # 要访问的目标网页
# API接口,返回格式为json
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006" # API接口
# API接口返回的proxy_list
proxy_list = requests.get(api_url).text()
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
async def fetch(url):
proxies = {
"http": f"http://{username}:{password}@{random.choice(proxy_list)}",
}
async with httpx.AsyncClient(proxies=proxies, timeout=10) as client:
resp = await client.get(url)
print(f"status_code: {resp.status_code}, content: {resp.content}")
def run():
loop = asyncio.get_event_loop()
# 异步发出5次请求
tasks = [fetch(page_url) for _ in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
if __name__ == '__main__':
run()
websocket¶
websocket(长连接版)
使用提示
- 安装运行所需的客户端: pip install websocket-client
- 使用HTTP代理发送websocket请求
- 在IP可用的情况下,客户端长时间不发送消息,服务端会断开连接
- 运行环境要求 python3.x
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
使用HTTP代理发送websocket请求
"""
import gzip
import zlib
import websocket
OPCODE_DATA = (websocket.ABNF.OPCODE_TEXT, websocket.ABNF.OPCODE_BINARY)
url = "ws://echo.websocket.org/"
proxies = {
"http_proxy_host": "117.69.63.102",
"http_proxy_port": 43787,
"http_proxy_auth": ("username", "password"),
}
ws = websocket.create_connection(url, **proxies)
def recv():
try:
frame = ws.recv_frame()
except websocket.WebSocketException:
return websocket.ABNF.OPCODE_CLOSE, None
if not frame:
raise websocket.WebSocketException("Not a valid frame %s" % frame)
elif frame.opcode in OPCODE_DATA:
return frame.opcode, frame.data
elif frame.opcode == websocket.ABNF.OPCODE_CLOSE:
ws.send_close()
return frame.opcode, None
elif frame.opcode == websocket.ABNF.OPCODE_PING:
ws.pong(frame.data)
return frame.opcode, frame.data
return frame.opcode, frame.data
def recv_ws():
opcode, data = recv()
if opcode == websocket.ABNF.OPCODE_CLOSE:
return
if opcode == websocket.ABNF.OPCODE_TEXT and isinstance(data, bytes):
data = str(data, "utf-8")
if isinstance(data, bytes) and len(data) > 2 and data[:2] == b'\037\213': # gzip magick
try:
data = "[gzip] " + str(gzip.decompress(data), "utf-8")
except Exception:
pass
elif isinstance(data, bytes):
try:
data = "[zlib] " + str(zlib.decompress(data, -zlib.MAX_WBITS), "utf-8")
except Exception:
pass
if isinstance(data, bytes):
data = repr(data)
print("< " + data)
def main():
print("Press Ctrl+C to quit")
while True:
message = input("> ")
ws.send(message)
recv_ws()
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print('\nbye')
except Exception as e:
print(e)
websocker(短连接版)
使用提示
- 安装运行所需的客户端: pip install websocket-client
- 使用HTTP代理发送websocket请求
- 运行环境要求 python3.x
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import ssl
import websocket
def on_message(ws, message):
print(message)
def on_error(ws, error):
print(error)
def on_open(ws):
data = '{}' # 此处填入您需要传给目标网站的json格式参数,如{"type":"web","data":{"_id":"xxxx"}}
ws.send(data)
def on_close(*args):
print("### closed ###")
proxies = {
"http_proxy_host": "117.69.63.102",
"http_proxy_port": 43787,
"http_proxy_auth": ("username", "password"),
}
def start():
websocket.enableTrace(True)
target_url = 'ws://127.0.0.1:5000/socket.io/?EIO=4&transport=websocket' # 此处替换您的目标网站
ws = websocket.WebSocketApp(
url = target_url,
header = [
"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
],
on_message=on_message,
on_error=on_error,
on_close=on_close,
)
ws.on_open = on_open
ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE}, **proxies)
if __name__ == "__main__":
start()
ProxyPool¶
ProxyPool
使用提示
- 此样例是动态代理简单IP池管理的实现
- requests不是python原生库,需要安装才能使用: pip install requests
- 支持Python2.7和Python3
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import time
import random
import threading
import requests
class ProxyPool():
def __init__(self, orderid, proxy_count):
self.orderid = orderid
self.proxy_count = proxy_count if proxy_count < 50 else 50 # 池子维护的IP总数,建议一般不要超过50
self.alive_proxy_list = [] # 活跃IP列表
def _fetch_proxy_list(self, count):
try:
res = requests.get("http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=json&trade_no=1834987042738379&sign=686f43f4b2d89d74c1680577d4ffbe47" % (self.orderid, count))
return [proxy.split(',') for proxy in res.json().get('data').get('proxy_list')]
except:
print("API获取IP异常,请检查订单")
return []
def _init_proxy(self):
"""初始化IP池"""
self.alive_proxy_list = self._fetch_proxy_list(self.proxy_count)
def add_alive_proxy(self, add_count):
"""导入新的IP, 参数为新增IP数"""
self.alive_proxy_list.extend(self._fetch_proxy_list(add_count))
def get_proxy(self):
"""从IP池中获取IP"""
return random.choice(self.alive_proxy_list)[0] if self.alive_proxy_list else ""
def run(self):
sleep_seconds = 1
self._init_proxy()
while True:
for proxy in self.alive_proxy_list:
proxy[1] = float(proxy[1]) - sleep_seconds # proxy[1]代表此IP的剩余可用时间
if proxy[1] <= 3:
self.alive_proxy_list.remove(proxy) # IP还剩3s时丢弃此IP
if len(self.alive_proxy_list) < self.proxy_count:
self.add_alive_proxy(self.proxy_count - len(self.alive_proxy_list))
time.sleep(sleep_seconds)
def start(self):
"""开启子线程更新IP池"""
t = threading.Thread(target=self.run)
t.setDaemon(True) # 将子线程设为守护进程,主线程不会等待子线程结束,主线程结束子线程立刻结束
t.start()
def parse_url(proxy):
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password,"proxy": proxy},
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": proxy_ip},
# }
# 要访问的目标网页
target_url = "https://www.juliangip.com/api/general/Test"
# 使用代理IP发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print(response.text)
if __name__ == '__main__':
proxy_pool = ProxyPool('1834987042xxxxx',30) # 订单号, 池子中维护的IP数
proxy_pool.start()
time.sleep(1) # 等待IP池初始化
proxy = proxy_pool.get_proxy() # 从IP池中提取IP
if proxy:
parse_url(proxy)
Python2¶
request¶
request (推荐)
使用提示
- 基于requests的代码样例支持访问http,https网页,推荐使用
- requests不是python原生库,需要安装才能使用: pip install requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求代理服务器
请求http和https网页均适用
"""
import requests
# 提取代理API接口,获取1个代理IP
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006"
# 获取API接口返回的代理IP
proxy_ip = requests.get(api_url).text
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": proxy_ip},
# "https": "http://%(proxy)s/" % {"proxy": proxy_ip},
# }
# 要访问的目标网页
target_url = "https://www.juliangip.com/api/general/Test"
# 使用代理IP发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print response.text
UrlLib2¶
UrlLib2
使用提示
- 基于urllib2的代码样例同时支持访问http和https网页
- 运行环境要求 python2.6 / 2.7
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用urllib2请求代理服务器
请求http和https网页均适用
"""
import urllib2
import ssl
# 全局取消证书验证,避免访问https网页报错
ssl._create_default_https_context = ssl._create_unverified_context
# 提取代理API接口,获取1个代理IP
api_url = "http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=text&split=1&trade_no=1834987042xxxxxx&sign=9e489baa3bf149593f149d7252efd006"
# 获取API接口返回的IP
proxy_ip = urllib2.urlopen(api_url).read()
# 用户名密码认证(动态代理/独享代理)
username = "username"
password = "password"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
}
# 白名单方式(需提前设置白名单)
# proxies = {
# "http": "http://%(proxy)s/" % {"proxy": proxy_ip},
# }
# 要访问的目标网页
target_url = "https://www.juliangip.com/api/general/Test"
# 使用代理IP发送请求
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
response = urllib2.urlopen(target_url)
# 获取页面内容
if response.code == 200:
print response.read()
Python-Selenium¶
Chrome¶
Chrome (IP白名单,推荐)
使用提示
- 基于白名单方式使用Selenium+Chrome认证代理
- 运行环境要求python⅔ + selenium + Chrome + Chromedriver + Windows/Linux/macOS
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应
- selenium不是python原生库,需要安装才能使用:pip install selenium
- 请注意替换代码中的部分信息:
${ip:port}:代理IP:端口号,如:"117.69.63.102:43787"
${chromedriver_path}: 您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import time
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=http://${ip:port}') # 代理IP:端口号
# ${chromedriver_path}: chromedriver驱动存放路径
driver = webdriver.Chrome(executable_path="${chromedriver_path}", options=options)
driver.get("https://www.juliangip.com/api/general/Test")
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Chrome (用户名密码认证)
使用提示
- 基于用户名密码方式使用Selenium + Chrome认证代理(chrome86版已测试通过)
- 运行环境要求python⅔ + selenium + Chrome + Chromedriver + Windows/Linux/macOS
- 下载chromedriver(注意chromedriver版本要和Chrome版本对应)
- selenium不是python原生库,需要安装才能使用:pip install selenium
- 请注意替换代码中的部分信息:
${proxy_ip}:代理IP
${proxy_port}:端口号
${username}}:用户名
${password}:密码
${chromedriver_path}:您本机chromedriver驱动存放路径,如:"C:\chromedriver.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import string
import zipfile
import time
def create_proxyauth_extension(proxy_host, proxy_port, proxy_username, proxy_password, scheme='http', plugin_path=None):
"""代理认证插件
args:
proxy_host (str): 你的代理地址或者域名(str类型)
proxy_port (int): 代理端口号(int类型)
# 用户名密码认证(动态代理/独享代理)
proxy_username (str):用户名(字符串)
proxy_password (str): 密码 (字符串)
kwargs:
scheme (str): 代理方式 默认http
plugin_path (str): 扩展的绝对路径
return str -> plugin_path
"""
if plugin_path is None:
plugin_path = 'vimm_chrome_proxyauth_plugin.zip'
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = string.Template(
"""
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "${scheme}",
host: "${host}",
port: parseInt(${port})
},
bypassList: ["foobar.com"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "${username}",
password: "${password}"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
"""
).substitute(
host=proxy_host,
port=proxy_port,
username=proxy_username,
password=proxy_password,
scheme=scheme,
)
with zipfile.ZipFile(plugin_path, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
return plugin_path
proxyauth_plugin_path = create_proxyauth_extension(
proxy_host="${proxy_ip}", # 代理IP
proxy_port="${proxy_port}", # 端口号
# 用户名密码(动态代理/独享代理)
proxy_username="${username}",
proxy_password="${password}"
)
options = webdriver.ChromeOptions()
options.add_extension(proxyauth_plugin_path)
# ${chromedriver_path}: chromedriver驱动存放路径
driver = webdriver.Chrome(executable_path="${chromedriver_path}", options=options)
driver.get("https://www.juliangip.com/api/general/Test")
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
使用提示
如需使用用户名密码+无界面方式认证代理,请使用使用 Selenium + PhantomJS
PhantomJS¶
用户名密码认证+无界面模式
使用提示
- 基于用户名密码+无界面方式使用 Selenium + PhantomJS认证代理
- 运行环境要求python⅔ + selenium + PhantomJS + Windows/Linux/macOS
- 点此下载PhantomJS(推荐使用2.1.1版)
- ${executable_path}:您本机PhantomJS驱动存放路径,如:"C:\phantomjs-2.1.1-windows\bin\phantomjs.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import time
#先下载phantomjs包文件,再填入phantomjs.exe的路径 (路径不要包含中文)
executable_path = '${executable_path}'
service_args=[
'--proxy=host:port', #此处替换您的代理ip,如117.69.63.102:23918
'--proxy-type=http',
'--proxy-auth=username:password' #用户名密码
]
driver=webdriver.PhantomJS(service_args=service_args,executable_path=executable_path)
driver.get('https://www.juliangip.com/api/general/Test')
print(driver.page_source)
time.sleep(3)
driver.close()
Firefox¶
Firefox (IP白名单,推荐)
使用提示
- 基于白名单方式使用Selenium+Firefox认证代理
- 运行环境要求python⅔ + selenium + Firefox + geckodriver + Windows/Linux/macOS
- 下载geckodriver(注意geckodriver版本要和Firefox版本对应)
- selenium不是python原生库,需要安装才能使用:pip install selenium
- 请注意替换代码中的部分信息:
${ip:port}: 代理IP:端口号,如:"117.69.63.102:43787"
${geckodriver_path}:您本机geckodriver驱动存放路径,如:"C:\geckodriver.exe"
#!/usr/bin/env python
# encoding: utf-8
from selenium import webdriver
import time
fp = webdriver.FirefoxProfile()
proxy = '${ip:port}'
ip, port = proxy.split(":")
port = int(port)
# 设置代理配置
fp.set_preference('network.proxy.type', 1)
fp.set_preference('network.proxy.http', ip)
fp.set_preference('network.proxy.http_port', port)
fp.set_preference('network.proxy.ssl', ip)
fp.set_preference('network.proxy.ssl_port', port)
driver = webdriver.Firefox(executable_path="${geckodriver_path}", firefox_profile=fp)
driver.get('https://www.juliangip.com/api/general/Test')
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Firefox (用户名密码认证)
使用提示
- 基于白名单方式使用Selenium+Firefox认证代理
- 运行环境要求python⅔ + selenium + Firefox + geckodriver + Windows/Linux/macOS
- 下载geckodriver(注意geckodriver版本要和Firefox版本对应)
- selenium不是python原生库,需要安装才能使用:pip install selenium
- 请注意替换代码中的部分信息:
${ip:port}:代理IP:端口号,如:"117.69.63.102:43787"
${geckodriver_path}:您本机geckodriver驱动存放路径,如:"C:\geckodriver.exe"
#!/usr/bin/env python
# encoding: utf-8
import time
from seleniumwire import webdriver # pip install selenium-wire
username = 'username' # 请替换您的用户名和密码
password = 'password'
proxy_ip = '117.69.63.102:43787' # 请替换您提取到的代理ip
options = {
'proxy': {
'http': "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
'https': "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip}
}
}
driver = webdriver.Firefox(seleniumwire_options=options,executable_path="${geckodriver_path}")
driver.get('https://www.juliangip.com/api/general/Test')
# 获取页面内容
print(driver.page_source)
# 延迟3秒后关闭当前窗口,如果是最后一个窗口则退出
time.sleep(3)
driver.close()
Python-Scrapy¶
使用提示
- http/https网页均可适用
- scrapy不是python原生库,需要安装才能使用: pip install scrapy
Scrapy项目目录
运行命令:scrapy startproject tutorial 新建Scrapy项目,创建包含下列内容的tutorial目录
tutorial/
scrapy.cfg # 项目的配置文件
tutorial/ # 该项目的python模块。之后您将在此加入代码
__init__.py
items.py # 项目中的item文件
pipelines.py # 项目中的pipelines文件
settings.py # 项目的设置文件
spiders/ # 放置spider代码的目录 __init__.py
...
myextend.py
添加自定义扩展(Extend):在tutorial/ 目录下新建myextend.py文件,调用时只需修改api_url以及在time.sleep处设置提取IP的间隔时间即可
#!/usr/bin/env python
# -- coding: utf-8 --
import time
import threading
import requests
from scrapy import signals
# 提取代理IP的api
api_url = 'http://v2.api.juliangip.com/dynamic/getips?num=1&pt=1&result_type=json&trade_no=1834987042738379&sign=686f43f4b2d89d74c1680577d4ffbe47'
foo = True
class Proxy:
def __init__(self, ):
self._proxy_list = requests.get(api_url).json().get('data').get('proxy_list')
@property
def proxy_list(self):
return self._proxy_list
@proxy_list.setter
def proxy_list(self, list):
self._proxy_list = list
pro = Proxy()
print(pro.proxy_list)
class MyExtend:
def __init__(self, crawler):
self.crawler = crawler
# 将自定义方法绑定到scrapy信号上,使程序与spider引擎同步启动与关闭
# scrapy信号文档: https://www.osgeo.cn/scrapy/topics/signals.html
# scrapy自定义拓展文档: https://www.osgeo.cn/scrapy/topics/extensions.html
crawler.signals.connect(self.start, signals.engine_started)
crawler.signals.connect(self.close, signals.spider_closed)
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def start(self):
t = threading.Thread(target=self.extract_proxy)
t.start()
def extract_proxy(self):
while foo:
pro.proxy_list = requests.get(api_url).json().get('data').get('proxy_list')
#设置每15秒提取一次ip
time.sleep(15)
def close(self):
global foo
foo = False
middlewares.py
- middlewares.py中新增ProxyDownloaderMiddleware即代理中间件
- 请注意替换代码中的部分信息: ${username}:用户名 ${password}:密码
#!/usr/bin/env python
# -- coding: utf-8 --
from scrapy import signals
from .myextend import pro
from w3lib.http import basic_auth_header
import random
class ProxyDownloaderMiddleware:
def process_request(self, request, spider):
proxy = random.choice(pro.proxy_list)
request.meta['proxy'] = "http://%(proxy)s" % {'proxy': proxy}
# 用户名密码认证(动态代理/独享代理)
request.headers['Proxy-Authorization'] = basic_auth_header('${username}', '${password}') # 白名单认证可注释此行
return None
settings.py
settings.py中激活ProxyDownloaderMiddleware代理中间件和自定义拓展
# -- coding: utf-8 --
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
'tutorial.middlewares.ProxyDownloaderMiddleware': 100,
}
#注意路径
EXTENSIONS = {
'tutorial.myextend.MyExtend': 300,
}
Java¶
okhttp¶
okhttp3-3.8.1
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码验证时必须重写 Authenticator 的 authenticate方法
- 添加依赖
import okhttp3.*;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Proxy;
public class TestProxyOKHttpClient {
public static void main(String args[]) throws IOException {
// 目标网站
String targetUrl = "https://www.juliangip.com/api/general/Test";
// 用户名密码认证(动态代理/独享代理)
final String username = "username";
final String password = "password";
String ip = "117.69.63.102"; // 代理服务器IP
int port = 43787;
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(ip, port));
Authenticator authenticator = new Authenticator() {
@Override
public Request authenticate(Route route, Response response) throws IOException {
String credential = Credentials.basic(username, password);
return response.request().newBuilder()
.header("Proxy-Authorization", credential)
.build();
}
};
OkHttpClient client = new OkHttpClient.Builder()
.proxy(proxy)
.proxyAuthenticator(authenticator)
.build();
Request request = new Request.Builder()
.url(targetUrl)
.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36")
.build();
Response response = client.newCall(request).execute();
System.out.println(response.body().string());
}
}
Java-Selenium¶
Chrome (IP白名单,推荐)
使用提示
- 基于白名单方式使用
Selenium+Chrome
认证代理 - 运行环境要求
jdk1.8+ + selenium + Chrome + Chromedriver + Windows/Linux/macOS
- 下载
chromedriver
(注意chromedriver
版本要和Chrome
版本对应) selenium
不是JDK
原生库,需要添加相关依赖才能使用
// 创建webdriver驱动
// /home/chromedriver表示您的chromedriver的目录
System.setProperty("webdriver.chrome.driver", "/home/chromedriver");
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--disable-gpu");
// 配置代理IP
// 123.123.123.123:56789表示您的代理ip:端口
chromeOptions.addArguments("--proxy-server=http://123.123.123.123:56789");
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://cip.cc");
httpclient¶
HttpClient-4.5.6
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求httpclient会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 若有多个用户名、密码进行认证需要在代码中须添加AuthCacheValue.setAuthCache(new AuthCacheImpl());
- 依赖包下载:
httpclient-4.5.6.jar
httpcore-4.4.10.jar
commons-codec-1.10.jar
commons-logging-1.2.jar
import java.net.URL;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
* 使用httpclient请求代理服务器 请求http和https网页均适用
*/
public class TestProxyHttpClient {
private static String pageUrl = "https://www.juliangip.com/api/general/Test"; // 要访问的目标网页
private static String proxyIp = "117.69.63.102"; // 代理服务器IP
private static int proxyPort = 43787; // 端口号
// 用户名密码认证(动态代理/独享代理)
private static String username = "username";
private static String password = "password";
public static void main(String[] args) throws Exception {
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(new AuthScope(proxyIp, proxyPort),
new UsernamePasswordCredentials(username, password));
CloseableHttpClient httpclient = HttpClients.custom().setDefaultCredentialsProvider(credsProvider).build();
try {
URL url = new URL(pageUrl);
HttpHost target = new HttpHost(url.getHost(), url.getDefaultPort(), url.getProtocol());
HttpHost proxy = new HttpHost(proxyIp, proxyPort);
/*
httpclient各个版本设置超时都略有不同, 此处对应版本4.5.6
setConnectTimeout:设置连接超时时间
setConnectionRequestTimeout:设置从connect Manager获取Connection 超时时间
setSocketTimeout:请求获取数据的超时时间
*/
RequestConfig config = RequestConfig.custom().setProxy(proxy).setConnectTimeout(6000)
.setConnectionRequestTimeout(2000).setSocketTimeout(6000).build();
HttpGet httpget = new HttpGet(url.getPath());
httpget.setConfig(config);
httpget.addHeader("Accept-Encoding", "gzip"); // 使用gzip压缩传输数据让访问更快
httpget.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36");
CloseableHttpResponse response = httpclient.execute(target, httpget);
try {
System.out.println(response.getStatusLine());
System.out.println(EntityUtils.toString(response.getEntity()));
} finally {
response.close();
}
} finally {
httpclient.close();
}
}
}
jsoup¶
使用jsoup发起请求
使用提示
- 此样例同时支持访问http和https网页
- 若有多个用户名、密码进行认证需要在代码中须添加AuthCacheValue.setAuthCache(new AuthCacheImpl());
- jsoup-1.13.1.jar
import java.io.IOException;
import java.net.Authenticator;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.Proxy;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class TestProxyJsoup {
// 用户名密码认证(动态代理/独享代理)
final static String ProxyUser = "username";
final static String ProxyPass = "password";
// 代理IP、端口号
final static String ProxyHost = "117.69.63.102";
final static Integer ProxyPort = 43787;
public static String getUrlProxyContent(String url) {
Authenticator.setDefault(new Authenticator() {
public PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(ProxyUser, ProxyPass.toCharArray());
}
});
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(ProxyHost, ProxyPort));
try {
// 此处自己处理异常、其他参数等
Document doc = Jsoup.connect(url).followRedirects(false).timeout(3000).proxy(proxy).get();
if (doc != null) {
System.out.println(doc.body().html());
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) throws Exception {
// 目标网站
String targetUrl = "https://www.juliangip.com/api/general/Test";
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
getUrlProxyContent(targetUrl);
}
}
hutool¶
使用hutool发起请求
使用提示
- 此样例同时支持访问http和https网页
- 使用用户名密码访问的情况下,每次请求会发送两次进行认证从而导致请求耗时增加,建议使用白名单访问
- 依赖包下载:
hutool-all-5.5.4.jar
import java.net.Authenticator;
import java.net.PasswordAuthentication;
import cn.hutool.http.HttpResponse;
import cn.hutool.http.HttpRequest;
// 代理验证信息
class ProxyAuthenticator extends Authenticator {
private String user, password;
public ProxyAuthenticator(String user, String password) {
this.user = user;
this.password = password;
}
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(user, password.toCharArray());
}
}
public class TestProxyHutool {
// 用户名密码认证(动态代理/独享代理)
final static String ProxyUser = "username";
final static String ProxyPass = "password";
// 代理IP、端口号
final static String ProxyHost = "117.69.63.102";
final static Integer ProxyPort = 43787;
public static void main(String[] args) {
// 目标网站
String url = "https://www.juliangip.com/api/general/Test";
// JDK 8u111版本后,目标页面为HTTPS协议,启用proxy用户密码鉴权
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
// 设置请求验证信息
Authenticator.setDefault(new ProxyAuthenticator(ProxyUser, ProxyPass));
// 发送请求
HttpResponse result = HttpRequest.get(url)
.setHttpProxy(ProxyHost, ProxyPort)
.timeout(20000)//设置超时,毫秒
.execute();
System.out.println(result.body());
}
}
GoLang¶
标准库¶
标准库
使用提示
·http和https网页均可适用
// 请求代理服务器
// http和https网页均适用
package main
import (
"compress/gzip"
"fmt"
"io"
"io/ioutil"
"net/http"
"net/url"
"os"
)
func main() {
// 用户名密码认证(动态代理/独享代理)
username := "username"
password := "password"
// 代理服务器
proxy_raw := "117.69.63.102:43787"
proxy_str := fmt.Sprintf("http://%s:%s@%s", username, password, proxy_raw)
proxy, err := url.Parse(proxy_str)
// 目标网页
page_url := "https://www.juliangip.com/api/general/Test"
// 请求目标网页
client := &http.Client{Transport: &http.Transport{Proxy: http.ProxyURL(proxy)}}
req, _ := http.NewRequest("GET", page_url, nil)
req.Header.Add("Accept-Encoding", "gzip") //使用gzip压缩传输数据让访问更快
res, err := client.Do(req)
if err != nil {
// 请求发生异常
fmt.Println(err.Error())
} else {
defer res.Body.Close() //保证最后关闭Body
fmt.Println("status code:", res.StatusCode) // 获取状态码
// 有gzip压缩时,需要解压缩读取返回内容
if res.Header.Get("Content-Encoding") == "gzip" {
reader, _ := gzip.NewReader(res.Body) // gzip解压缩
defer reader.Close()
io.Copy(os.Stdout, reader)
os.Exit(0) // 正常退出
}
// 无gzip压缩, 读取返回内容
body, _ := ioutil.ReadAll(res.Body)
fmt.Println(string(body))
}
}
goKit¶
goKit
使用提示
·http和https网页均可适用
·go get github.com/xingcxb/goKit
package main
import(
"fmt"
"github.com/xingcxb/goKit/core/httpKit"
)
func main() {
/// get请求
// 白名单认证
fmt.Println(httpKit.HttpProxyGet("https://cip.cc", "22.33.44.55:59582"))
fmt.Println("------------------>")
// 用户名密码认证
fmt.Println(httpKit.HttpProxyGetFull("https://cip.cc", nil, nil,
"", 300, "http", "username", "password",
"22.33.44.55:59582"))
fmt.Println("------------------>")
/// post请求
// 白名单认证
fmt.Println(httpKit.HttpProxyPost("https://cip.cc", nil, "22.33.44.55:59582"))
fmt.Println("------------------>")
// 用户名密码认证
fmt.Println(httpKit.HttpProxyPostFull("https://cip.cc", nil, nil,
"", 300, "http", "username", "password", "22.33.44.55:59582"))
}
CSharp¶
标准库¶
标准库
使用提示
·http和https网页均可适用
using System;
using System.Text;
using System.Net;
using System.IO;
using System.IO.Compression;
namespace csharp_http
{
class Program
{
static void Main(string[] args)
{
// 要访问的目标网页
string page_url = "https://www.juliangip.com/api/general/Test";
// 构造请求
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(page_url);
request.Method = "GET";
request.Headers.Add("Accept-Encoding", "Gzip"); // 使用gzip压缩传输数据让访问更快
// 代理服务器
string proxy_ip = "117.69.63.102";
int proxy_port = 43787;
// 用户名密码认证(动态代理/独享代理)
string username = "username";
string password = "password";
// 设置代理 (开放代理或动态/独享代理&已添加白名单)
// request.Proxy = new WebProxy(proxy_ip, proxy_port);
// 设置代理 (动态/独享代理&未添加白名单)
WebProxy proxy = new WebProxy();
proxy.Address = new Uri(String.Format("http://{0}:{1}", proxy_ip, proxy_port));
proxy.Credentials = new NetworkCredential(username, password);
request.Proxy = proxy;
// 请求目标网页
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine((int)response.StatusCode); // 获取状态码
// 解压缩读取返回内容
using (StreamReader reader = new StreamReader(new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))) {
Console.WriteLine(reader.ReadToEnd());
}
}
}
}
Node.js¶
标准库(http+url)¶
标准库(http,https均适用)
使用提示
·http和https网页均可适用
const http = require("http"); // 引入内置http模块
const url = require("url");
// 要访问的目标页面
const targetUrl = "https://www.juliangip.com/api/general/Test";
const urlParsed = url.parse(targetUrl);
// 代理ip
const proxyIp = "proxyIp"; // 代理服务器ip
const proxyPort = "proxyPort"; // 代理服务器host
// 用户名密码认证(动态代理/独享代理)
const username = "username";
const password = "password";
const base64 = new Buffer.from(username + ":" + password).toString("base64");
const options = {
host : proxyIp,
port : proxyPort,
path : targetUrl,
method : "GET",
headers : {
"Host" : urlParsed.hostname,
"Proxy-Authorization" : "Basic " + base64
}
};
http.request(options, (res) => {
console.log("got response: " + res.statusCode);
// 输出返回内容(使用了gzip压缩)
if (res.headers['content-encoding'] && res.headers['content-encoding'].indexOf('gzip') != -1) {
let zlib = require('zlib');
let unzip = zlib.createGunzip();
res.pipe(unzip).pipe(process.stdout);
} else {
// 输出返回内容(未使用gzip压缩)
res.pipe(process.stdout);
}
})
.on("error", (err) => {
console.log(err);
})
.end()
;
标准库(http+tls+util)¶
标准库(http,https均适用)
使用提示
·http和https网页均可适用
let http = require('http'); // 引入内置http模块
let tls = require('tls'); // 引入内置tls模块
let util = require('util');
// 用户名密码认证(动态代理/独享代理)
const username = 'username';
const password = 'password';
const auth = 'Basic ' + new Buffer.from(username + ':' + password).toString('base64');
// 代理服务器ip和端口
let proxy_ip = '117.69.63.102';
let proxy_port = 43787;
// 要访问的主机和路径
let remote_host = 'https://www.juliangip.com/api/general/Test';
let remote_path = '/';
// 发起CONNECT请求
let req = http.request({
host: proxy_ip,
port: proxy_port,
method: 'CONNECT',
path: util.format('%s:443', remote_host),
headers: {
"Host": remote_host,
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36",
"Proxy-Authorization": auth,
"Accept-Encoding": "gzip" // 使用gzip压缩让数据传输更快
}
});
req.on('connect', function (res, socket, head) {
// TLS握手
let tlsConnection = tls.connect({
host: remote_host,
socket: socket
}, function () {
// 发起GET请求
tlsConnection.write(util.format('GET %s HTTP/1.1\r\nHost: %s\r\n\r\n', remote_path, remote_host));
});
tlsConnection.on('data', function (data) {
// 输出响应结果(完整的响应报文串)
console.log(data.toString());
});
});
req.end();
request¶
标准库(http,https均适用)
使用提示
·请先安装request库: npm install request ·http网页和https网页均可适用
let request = require('request'); // 引入第三方request库
let util = require('util');
let zlib = require('zlib');
// 用户名密码认证(动态代理/独享代理)
const username = 'username';
const password = 'password';
// 要访问的目标地址
let page_url = 'https://www.juliangip.com/api/general/Test'
// 代理服务器ip和端口
let proxy_ip = '117.69.63.102';
let proxy_port = 43787;
// 完整代理服务器url
let proxy = util.format('http://%s:%s@%s:%d', username, password, proxy_ip, proxy_port);
// 发起请求
request({
url: page_url,
method: 'GET',
proxy: proxy,
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3100.0 Safari/537.36",
"Accept-Encoding": "gzip" // 使用gzip压缩让数据传输更快
},
encoding: null, // 方便解压缩返回的数据
}, function(error, res, body) {
if (!error && res.statusCode == 200) {
// 输出返回内容(使用了gzip压缩)
if (res.headers['content-encoding'] && res.headers['content-encoding'].indexOf('gzip') != -1) {
zlib.gunzip(body, function(err, dezipped) {
console.log(dezipped.toString());
});
} else {
// 输出返回内容(没有使用gzip压缩)
console.log(body);
}
} else {
console.log(error);
}
});
puppeteer¶
puppeteer (IP白名单)
使用提示
·基于用户名密码认证的http/https代理Puppeteer ·运行环境要求: node7.6.0或以上 + puppeteer ·请先安装puppeteer: npm i puppeteer
// 引入puppeteer模块
const puppeteer = require('puppeteer');
// 要访问的目标网页
const url = 'https://www.juliangip.com/api/general/Test';
// 添加headers
const headers = {
'Accept-Encoding': 'gzip' // 使用gzip压缩让数据传输更快
};
// 代理服务器ip和端口
let proxy_ip = '117.69.63.102'
let proxy_port = 43787
(async ()=> {
// 新建一个浏览器实例
const browser = await puppeteer.launch({
headless: false, // 是否不显示窗口, 默认为true, 设为false便于调试
args: [
`--proxy-server=${proxy_ip}:${proxy_port}`,
'--no-sandbox',
'--disable-setuid-sandbox'
]
});
// 打开一个新页面
const page = await browser.newPage();
// 设置headers
await page.setExtraHTTPHeaders(headers);
// 访问目标网页
await page.goto(url);
})();
puppeteer(用户名密码认证)
使用提示
·基于白名单的http/https代理Puppeteer ·运行环境要求: node7.6.0或以上 + puppeteer ·请先安装puppeteer: npm i puppeteer
// 引入puppeteer模块
const puppeteer = require('puppeteer');
// 要访问的目标网页
const url = 'https://www.juliangip.com/api/general/Test';
// 添加headers
const headers = {
'Accept-Encoding': 'gzip' // 使用gzip压缩让数据传输更快
};
// 代理服务器ip和端口
let proxy_ip = '117.69.63.102'
let proxy_port = 43787
// 用户名密码认证(动态代理/独享代理)
const username = 'username';
const password = 'password';
(async ()=> {
// 新建一个浏览器实例
const browser = await puppeteer.launch({
headless: false, // 是否不显示窗口, 默认为true, 设为false便于调试
args: [
`--proxy-server=${proxy_ip}:${proxy_port}`,
'--no-sandbox',
'--disable-setuid-sandbox'
]
});
// 打开一个新页面
const page = await browser.newPage();
// 设置headers
await page.setExtraHTTPHeaders(headers);
// 用户民密码认证
await page.authenticate({username: username, password: password});
// 访问目标网页
await page.goto(url);
})();
axios¶
http,https均适用
使用提示
- 终端执行 npm install axios-https-proxy-fix 下载相应的包
// 引入包:axios-https-proxy-fix
let axios = require('axios-https-proxy-fix')
// 目标网站
let targetUrl = 'https://cip.cc' //要访问的目标站点
//服务器信息
let serverURL = '***********' //代理服务器地址
let serverPort = '***********' //代理服务器端口
// 设置代理
let proxy = {
host: serverURL,
port: serverPort,
// 使用白名单可不添加 auth 字段
// auth: {
// username: authKey,
// password: authPwd
// }
}
axios.get(targetUrl, {proxy: proxy}) //请求时设置调用代理信息
.then((res) => {
console.log(res.data)
}).catch((err) => {
console.log(err.message)
})
Ruby¶
net/http¶
net/http(IP白名单)
使用提示
- 基于用户名密码认证的http/https代理net/http
# -*- coding: utf-8 -*-
require 'net/http' # 引入内置net/http模块
require 'zlib'
require 'stringio'
# 代理服务器ip 和 端口
proxy_ip = '117.69.63.102'
proxy_port = 43787
# 目标站,此处为示例
page_url = "https://www.juliangip.com/api/general/Test"
uri = URI(page_url)
# 新建代理实例
proxy = Net::HTTP::Proxy(proxy_ip, proxy_port)
# 创建新的请求对象
req = Net::HTTP::Get.new(uri)
# 设置User-Agent
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
req['Accept-Encoding'] = 'gzip' # 使用gzip压缩传输数据让访问更快
# 使用代理发起请求, 若访问的是http网页, 请将use_ssl设为false
res = proxy.start(uri.hostname, uri.port, :use_ssl => true) do |http|
http.request(req)
end
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
net/http(用户名密码认证)
使用提示
- 基于用户名密码认证的http/https代理net/http
# -*- coding: utf-8 -*-
require 'net/http' # 引入内置net/http模块
require 'zlib'
require 'stringio'
# 代理服务器ip 和 端口
proxy_ip = '117.69.63.102'
proxy_port = 43787
# 用户名密码认证(动态代理/独享代理)
username = 'username'
password = 'password'
# 要访问的目标网页, 此处为示例
page_url = "https://www.juliangip.com/api/general/Test"
uri = URI(page_url)
# 新建代理实例
proxy = Net::HTTP::Proxy(proxy_ip, proxy_port, username, password)
# 创建新的请求对象
req = Net::HTTP::Get.new(uri)
# 设置代理用户名密码认证(动态代理/独享代理)
req.basic_auth(username, password)
# 设置User-Agent
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'
req['Accept-Encoding'] = 'gzip' # 使用gzip压缩传输数据让访问更快
# 使用代理发起请求, 若访问的是http网页, 请将use_ssl设为false
res = proxy.start(uri.hostname, uri.port, :use_ssl => true) do |http|
http.request(req)
end
# 输出状态码
puts "status code: #{res.code}"
# 输出响应体
if res.code.to_i != 200 then
puts "page content: #{res.body}"
else
gz = Zlib::GzipReader.new(StringIO.new(res.body.to_s))
puts "page content: #{gz.read}"
end
php¶
curl¶
curl
使用提示
1.此样例同时支持访问http和https网页
2.curl不是php原生库,需要安装才能使用:
Ubuntu/Debian系统:apt-get install php5-curl
CentOS系统:yum install php-curl
<?php
//要访问的目标页面
$page_url = "https://www.juliangip.com/api/general/Test";
$ch = curl_init();
$proxy_ip = "117.69.63.102";
$proxy_port = "43787";
$proxy = $proxy_ip.":".$proxy_port;
// 用户名密码认证(动态代理/独享代理)
$username = "username";
$password = "password";
//$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page_url);
//发送post请求
$requestData["post"] = "send post request";
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($requestData));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
//设置代理
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//设置代理用户名密码
curl_setopt($ch, CURLOPT_PROXYAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, "{$username}:{$password}");
//自定义header
$headers = array();
$headers["user-agent"] = 'User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
//自定义cookie
curl_setopt($ch, CURLOPT_COOKIE,'');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip'); //使用gzip压缩传输数据让访问更快
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$summary = curl_getsummary($ch);
curl_close($ch);
echo "$result"; // 使用请求页面方式执行时,打印变量需要加引号
echo "\n\nfetch ".$summary['url']."\ntimeuse: ".$summary['total_time']."s\n\n";
?>
没有您在使用的编程语言示例?
因开发语言太多,如果没有找到您的编程语言示例,请您联系客服处理!