前言

最近项目上需要内外网数据传输,需要用到代理,原来直接用的第三方的代理库LIttleProxy,运行一段时间发现会出现内存溢出的情况,通过分析dump文件,发现是连接过多导致的,在github上查了一下,发现的确有人提了issue。但这个项目已经没有人维护了。遂决定基于netty自己实现一个代理服务。考虑到http代理还需要对每个请求进行解析,所以打算实现一个socks代理服务,同时为了兼顾一定的安全性,最终决定实现一个socks5的代理服务,因为socks5提供了用户名密码的安全验证功能。

过程

由于netty开发socks5代理服务不是很难,开发工作很快完成。后面就是客户单的适配,由于客户端使用了httpclient4.5,httpclient4.5本身不支持socks代理,但是jdk本身是提供socks代理功能,这是链接,所以socks代理对httpclient来说是透明的,不需要做任何处理可以直接用。在本机联调测试了一下,调用成功,看起来一切正常。

在第二天,将程序部署到一个真实的内网环境测试发现了问题。网络请求报如下异常:

java.net.UnknownHostException: www.baidu.com
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
 at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
 at java.net.InetAddress.getAllByName0(InetAddress.java:1154)
 at java.net.InetAddress.getAllByName(InetAddress.java:1084)
 at java.net.InetAddress.getAllByName(InetAddress.java:1020)
 at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:242)
 at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:130)
 at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
 at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
 at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
 at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
 at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
 at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
 at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)

由于这是内网环境,无法解析域名,但是httpclient请求又要求解析域名。通过网上找到方法绕过了这个坑。一切就绪后,发现还是有问题,报错变了:

Caused by: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
    at sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:541)
    at sun.security.ssl.InputRecord.read(InputRecord.java:374)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:893)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1294)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:848)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)

看到这个错误,感觉是SSL握手出问题了,由于对SSL不是很了解,临时看了一下协议,也没有任何进展。在同事提醒下,换个http的请求看看。发现也有问题:

org.apache.http.client.ClientProtocolException
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at HttpTest.http(HttpTest.java:163)
    at HttpTest.main(HttpTest.java:116)
Caused by: org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:149)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    ... 4 more

感觉是返回的请求数据不对,把httpclient的debug日志打开发现了问题:

DEBUG [org.apache.http.client.protocol.RequestAddCookies] CookieSpec selected: default
DEBUG [org.apache.http.client.protocol.RequestAuthCache] Auth cache not set in the context
DEBUG [org.apache.http.impl.conn.PoolingHttpClientConnectionManager] Connection request: [route: {}->http://hc.apache.org:80][total available: 0; route allocated: 0 of 30; total allocated: 0 of 30]
DEBUG [org.apache.http.impl.conn.PoolingHttpClientConnectionManager] Connection leased: [id: 0][route: {}->http://hc.apache.org:80][total available: 0; route allocated: 1 of 30; total allocated: 1 of 30]
DEBUG [org.apache.http.impl.execchain.MainClientExec] Opening connection {}->http://hc.apache.org:80
DEBUG [org.apache.http.impl.conn.DefaultHttpClientConnectionOperator] Connecting to hc.apache.org/95.216.24.32:80
DEBUG [org.apache.http.impl.conn.DefaultHttpClientConnectionOperator] Connection established 10.6.252.194:53999<->0.0.0.0:80
DEBUG [org.apache.http.impl.conn.DefaultManagedHttpClientConnection] http-outgoing-0: set socket timeout to 30000
DEBUG [org.apache.http.impl.execchain.MainClientExec] Executing request GET / HTTP/1.1
DEBUG [org.apache.http.impl.execchain.MainClientExec] Target auth state: UNCHALLENGED
DEBUG [org.apache.http.impl.execchain.MainClientExec] Proxy auth state: UNCHALLENGED
DEBUG [org.apache.http.headers] http-outgoing-0 >> GET / HTTP/1.1
DEBUG [org.apache.http.headers] http-outgoing-0 >> Host: hc.apache.org
DEBUG [org.apache.http.headers] http-outgoing-0 >> Connection: Keep-Alive
DEBUG [org.apache.http.headers] http-outgoing-0 >> User-Agent: Apache-HttpClient/4.5.12 (Java/1.8.0_191)
DEBUG [org.apache.http.headers] http-outgoing-0 >> Accept-Encoding: gzip,deflate
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: c.apache.orgPHTTP/1.1 200 OK
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Date: Thu, 14 May 2020 03:54:56 GMT
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Server: Apache/2.4.18 (Ubuntu)
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Last-Modified: Sat, 22 Feb 2020 12:48:20 GMT
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: ETag: "3239-59f298d8029ef-gzip"
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Accept-Ranges: bytes
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Vary: Accept-Encoding
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Content-Encoding: gzip
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Access-Control-Allow-Origin: *
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Content-Length: 3050
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Keep-Alive: timeout=5, max=2000
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Connection: Keep-Alive
DEBUG [org.apache.http.impl.conn.DefaultHttpResponseParser] Garbage in response: Content-Type: text/html

客户端收取的相应行居然多了几个字符c.apache.orgPHTTP/1.1 200 OK,所以导致无法解析response。但是这些字符是从哪里来的呢?首先怀疑是开发的socks5服务有问题,但是通过浏览器测试和curl测试都是正常的,说明socks5服务没问题。进一步排查发现,如果httpclient所在机器能够解析域名,就能正常发送请求,如果解析不了,就会出现这个问题。

带着疑惑重新梳理了一个socks5协议的流程,详情参见

  • 第一步,client与proxy建立连接后,发送一个请求,告诉proxy它支持的认证方法,格式如下:
    +——+————–+————–+
    |VER | NMETHODS | METHODS |
    +—–+—————+————–+
    | 1 | 1 | 1 to 255 |
    +—–+—————+————–+
    第一个字节,表示服务的版本号,对于socks5 就是“0X05”,第二个字节表示客户端支持的认证方法数量,METHODS,就是具体的方法,总数等于NMETHODS。

  • 第二步,proxy发送给client一个响应,通知client它选择的认证方法:
    +—-+——–+
    |VER | METHOD |
    +—-+——–+
    | 1 | 1 |
    +—-+——–+
    VER是版本号,同第一步,METHOD是从第一步的请求中选择的一个MEHTOD。
    目前定义的方法有:

    • 0X’00’ NO AUTHENTICATION REQUIRED
    • 0X’01’ GSSAPI
    • 0X’02’ USERNAME/PASSWORD
    • 0X’03’ to X’7F’ IANA ASSIGNED
    • 0X’80’ to X’FE’ RESERVED FOR PRIVATE METHODS
    • 0X’FF’ NO ACCEPTABLE METHODS
  • 第三步,这不是可选的,如果proxy不需要验证,则不进行这一步,主要进行安全认证的。
  • 第四步,是第三步的响应,说明认证是否成功。
  • 第五步,是发送请求,告诉proxy执行什么命令:
    +----+-----+-------+------+----------+----------+
    |VER | CMD |  RSV  | ATYP | DST.ADDR | DST.PORT |
    +----+-----+-------+------+----------+----------+
    | 1  |  1  | X'00' |  1   | Variable |    2     |
    +----+-----+-------+------+----------+----------+
    
    • VER protocol version: X’05’
    • CMD
      • CONNECT X’01’
      • BIND X’02’
      • UDP ASSOCIATE X’03’
    • RSV RESERVED
    • ATYP address type of following address
      • IP V4 address: X’01’
      • DOMAINNAME: X’03’
      • IP V6 address: X’04’
    • DST.ADDR desired destination address
    • DST.PORT desired destination port in network octet order

    由于,client无法解析域名,在我的场景中,ATYP是0X03,也就是DOMAIN,DOMAIN类型 DST.ADDR是不定长的,其第一个字节描述了长度。

  • 第六步,根据请求连接目标地址。

  • 第七步,Proxy发送响应给client,告诉客户端结果。

    +----+-----+-------+------+----------+----------+
    |VER | REP |  RSV  | ATYP | BND.ADDR | BND.PORT |
    +----+-----+-------+------+----------+----------+
    | 1  |  1  | X'00' |  1   | Variable |    2     |
    +----+-----+-------+------+----------+----------+
    
    • VER protocol version: X’05’
    • REP Reply field:
      • X’00’ succeeded
      • X’01’ general SOCKS server failure
      • X’02’ connection not allowed by ruleset
      • X’03’ Network unreachable
      • X’04’ Host unreachable
      • X’05’ Connection refused
      • X’06’ TTL expired
      • X’07’ Command not supported
      • X’08’ Address type not supported
      • X’09’ to X’FF’ unassigned
    • RSV RESERVED
    • ATYP address type of following address
      由于第五步,ATYP是DOMAIN,这个地方也是DOMAIN,然后把域名和端口返回给客户端。
  • 剩下的就是client和internet交互,proxy只负责转发数据。

在梳理socks5协议的过程中,参照jdk的代码,发现了问题的位置java.net.SocksSocketImpl.connect(SocketAddress endpoint, int timeout)。参照socks5的步骤:

       // 开始第一步
        out.write(PROTO_VERS);
        out.write(2);
        out.write(NO_AUTH);//不需要认证
        out.write(USER_PASSW);//用户名密码认证
        out.flush();
        byte[] data = new byte[2];
        int i = readSocksReply(in, data, deadlineMillis);
        if (i != 2 || ((int)data[0]) != PROTO_VERS) {
            // Maybe it's not a V5 sever after all
            // Let's try V4 before we give up
            // SOCKS Protocol version 4 doesn't know how to deal with
            // DOMAIN type of addresses (unresolved addresses here)
            if (epoint.isUnresolved())
                throw new UnknownHostException(epoint.toString());
            connectV4(in, out, epoint, deadlineMillis);
            return;
        }
        if (((int)data[1]) == NO_METHODS)
            throw new SocketException("SOCKS : No acceptable methods");
        //安全认证逻辑
        if (!authenticate(data[1], in, out, deadlineMillis)) {
            throw new SocketException("SOCKS : authentication failed");
        }
       //发送 CONNECT请求
        out.write(PROTO_VERS);
        out.write(CONNECT);
        out.write(0);
        /* Test for IPV4/IPV6/Unresolved */
       //由于内网没法解析域名,所以走此分支,DOMAIN方式
        if (epoint.isUnresolved()) {
            out.write(DOMAIN_NAME);
            out.write(epoint.getHostName().length());//先写入adder的长度
            try {
                out.write(epoint.getHostName().getBytes("ISO-8859-1"));//写域名的具体值
            } catch (java.io.UnsupportedEncodingException uee) {
                assert false;
            }
            out.write((epoint.getPort() >> 8) & 0xff);
            out.write((epoint.getPort() >> 0) & 0xff);
        } else if (epoint.getAddress() instanceof Inet6Address) {
            out.write(IPV6);
            out.write(epoint.getAddress().getAddress());
            out.write((epoint.getPort() >> 8) & 0xff);
            out.write((epoint.getPort() >> 0) & 0xff);
        } else {
            out.write(IPV4);
            out.write(epoint.getAddress().getAddress());
            out.write((epoint.getPort() >> 8) & 0xff);
            out.write((epoint.getPort() >> 0) & 0xff);
        }
        out.flush();
        data = new byte[4];
       //先读取4个字节的响应
        i = readSocksReply(in, data, deadlineMillis);
        if (i != 4)
            throw new SocketException("Reply from SOCKS server has bad length");
        SocketException ex = null;
        int len;
        byte[] addr;
       //data[1]存储的是状态,根据其判断是否成功
        switch (data[1]) {
        case REQUEST_OK:
            // success!
            // 根据data[3] 判断地址类型
            switch(data[3]) {
            case IPV4:
                addr = new byte[4];//IPV4 直接从网络读取四个字节。
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != 4)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];//端口号
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case DOMAIN_NAME:
                len = data[1];//等等问题出在这里,len不应该是从网络读一个字节么,怎么直接用了data[1]?
                byte[] host = new byte[len];
                i = readSocksReply(in, host, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case IPV6:
                len = data[1];//这个地方也有问题,len应该是固定的16字节,怎么直接用了data[1]?
                addr = new byte[len];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            default:
                ex = new SocketException("Reply from SOCKS server contains wrong code");
                break;
            }
            break;
        case GENERAL_FAILURE:
            ex = new SocketException("SOCKS server general failure");
            break;
        case NOT_ALLOWED:
            ex = new SocketException("SOCKS: Connection not allowed by ruleset");
            break;
        case NET_UNREACHABLE:
            ex = new SocketException("SOCKS: Network unreachable");
            break;
        case HOST_UNREACHABLE:
            ex = new SocketException("SOCKS: Host unreachable");
            break;
        case CONN_REFUSED:
            ex = new SocketException("SOCKS: Connection refused");
            break;
        case TTL_EXPIRED:
            ex =  new SocketException("SOCKS: TTL expired");
            break;
        case CMD_NOT_SUPPORTED:
            ex = new SocketException("SOCKS: Command not supported");
            break;
        case ADDR_TYPE_NOT_SUP:
            ex = new SocketException("SOCKS: address type not supported");
            break;
        }
        if (ex != null) {
            in.close();
            out.close();
            throw ex;
        }

上面的代码描述中,已经指出了问题,实际上就是jdk实现的socks5,响应处理有问题,没有把完整的响应读出来,导致剩余的部分和HTTP的响应黏在了一起,最终导致http请求失败。

搜索了一下jdk的bug,发现了问题SOCKS proxying does not work with IPv6 connections。bug已经在jdk9 b02 和openjdk8u222修复了,查看了openjdk8u222的发现的确修复了这个问题:

data = new byte[4];
        i = readSocksReply(in, data, deadlineMillis);
        if (i != 4)
            throw new SocketException("Reply from SOCKS server has bad length");
        SocketException ex = null;
        int len;
        byte[] addr;
        switch (data[1]) {
        case REQUEST_OK:
            // success!
            switch(data[3]) {
            case IPV4:
                addr = new byte[4];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != 4)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case DOMAIN_NAME:
                byte[] lenByte = new byte[1];//先读取一个字节的长度
                i = readSocksReply(in, lenByte, deadlineMillis);
                if (i != 1)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                len = lenByte[0] & 0xFF;
                byte[] host = new byte[len];//然后在读取具体内容
                i = readSocksReply(in, host, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case IPV6:
                len = 16;//固定16字节
                addr = new byte[len];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            default:
                ex = new SocketException("Reply from SOCKS server contains wrong code");
                break;
            }
            break;
        case GENERAL_FAILURE:
            ex = new SocketException("SOCKS server general failure");
            break;
        case NOT_ALLOWED:
            ex = new SocketException("SOCKS: Connection not allowed by ruleset");
            break;
        case NET_UNREACHABLE:
            ex = new SocketException("SOCKS: Network unreachable");
            break;
        case HOST_UNREACHABLE:
            ex = new SocketException("SOCKS: Host unreachable");
            break;
        case CONN_REFUSED:
            ex = new SocketException("SOCKS: Connection refused");
            break;
        case TTL_EXPIRED:
            ex =  new SocketException("SOCKS: TTL expired");
            break;
        case CMD_NOT_SUPPORTED:
            ex = new SocketException("SOCKS: Command not supported");
            break;
        case ADDR_TYPE_NOT_SUP:
            ex = new SocketException("SOCKS: address type not supported");
            break;
        }

问题找到了,但是升级比较困难,有些用了IBM的JDK。升级很麻烦。最终想了一个取巧的办法,socks5服务响应client的connect请求的时候,不按照请求的ATYP来写响应,直接写成IPV4类型(目前系统不支持IPV6),因为从源码来看,SocksSocketImpl也只是读取了响应,并没有做校验或者其他操作,这样绕过了这个bug,目前还没有遇到其他问题。