请帮我弄清楚这个网络代理代码有什么问题 [英] Please help me figure out what's wrong with this web proxy code

查看:120
本文介绍了请帮我弄清楚这个网络代理代码有什么问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  $ b $ 

我想为练习编写一个Web代理,并且这是我目前使用的代码: b //返回包含端口和主机的映射
def parseHostAndPort(String data){
def objMap //这里有主机和端口作为键
data.eachLine {line - >
if(line =〜/ ^(?i)get | put | post | head | trace | delete /){
println line
def components = line.split()
def resource = components [1]
def colon = resource.indexOf(:)
if(冒号!= -1){
URL u =新的URL(资源)
def pHost = u.host
def pPort = u.port
return(objMap = [host:pHost,port:pPort])
}
else {
return(objMap = [host:resource,port:80])
}
}
}
return objMap
}

//从客户端
def读取一个http请求readClientData(Socket clientSocket){
def actualBuffer = new StringBuilder()
InputStream inStream = clientSocket.inputStream
while(true){
def available = inStream.available()
if(available == 0)
break;
printlnavailable data $ available
def buffer = new byte [available]
def bytesRead = inStream.read(buffer,0,available)
actualBuffer<<新字符串(缓冲区)
}
返回actualBuffer.toString()
}

def sock = new ServerSocket(9000)
sock.reuseAddress = true
while(true){
sock.accept {cli - >
println有一个客户端
def data = readClientData(cli)
def parsed = parseHostAndPort(数据)
def host = parsed [host]
def port = parsed [port]

println从客户端获得数据

def nsock = new Socket(主机,端口)
nsock<< data //将从客户端接收到的数据发送到套接字
nsock.outputStream.flush()
def datax = readClientData(nsock)
println返回$ datax
cli< ;< datax //向客户端发送响应
cli.outputStream.flush()
cli.close()
}
}



$ b

现在,它所做的只是:


  • 阅读我的浏览器发送的HTTP请求


  • 解析主机和端口

  • p>连接到该主机,并写入从客户端接收的数据。 发送客户端返回从主机接收的数据 >


但是......它一直不工作。有时它会提出很好的要求,有时候不会。我认为这是一个缓冲问题,我不确定。问题是,我添加了 flush 调用,但仍然没有任何结果。



你能发现我做错了什么吗? ?

编辑:




  • 我注意到如果我添加一些 sleep 调用,代理似乎对更多的请求工作,但不是全部。/ b>
  • 收集赏金,帮助我发现我做错了什么。什么是用于Web代理的正常算法?我在哪里偏离它?感谢!
  • 首先,真的很难知道这里到底发生了什么 - 有时它会提出很好的要求,有时候不会。并没有真正描述发生问题时发生了什么!!



    也就是说,我仍然能够找出你的问题。



    正如您已经说过的,您正在寻找能够始终如一地工作的最基本的解决方案,所以我会避免任何不必要的或者进入代码的效率或其他方面。此外,我会先给你答案,然后描述是什么导致了这个问题(这很长,但值得一读:)

    解决方案



    对于您的问题的简单回答是,您需要执行一些HTTP协议解析来确定是否所有数据都已由客户端发送,而不依赖于可用的 read() return。这是多少PITA取决于你希望如何完全支持HTTP协议。为了支持GET请求,这非常简单。支持指定内容长度的POST有点困难。支持其他编码类型(例如分块或多部分/字节范围请参阅 http: //b.tools.ietf.org/html/rfc2616#section-4.4 )。

    无论如何,我假设你只是想让GETs工作,所以要做到这一点,你必须知道HTTP标题和bodys之间用空行分开,HTTP的行分隔符是\r\\\
    ,GETs没有正文。因此,客户端在发送GET请求时发送完毕。



    像这样的代码应该一直处理GET (代码未经测试,但应至少使您至少达到90%):

      def readClientData(Socket clientSocket){

    def actualBuffer = new StringBuilder()
    def eof = false;

    def emptyLine = ['\r','\\\
    ','\r','\\\
    ']
    def lastEmptyLineChar = 0

    InputStream inStream = clientSocket.inputStream
    while(!eof){
    def available = inStream.available()
    println可用数据$ available

    / /尝试读取所有可用字节
    def buffer = new byte [available]
    def bytesRead = inStream.read(buffer,0,available)

    //检查空行:
    // *遍历缓冲区,直到找到emptyLine的第一个元素
    // *继续通过缓冲区迭代检查缓冲区的后续元素与emptyLine,而连续元素匹配
    // * if buffer和emptyLine中的任何元素都不匹配,因为通过缓冲区的迭代继续寻找emptyLine的第一个元素,继续
    // *如果到达emptyLine的末尾并匹配缓冲区,那么emptyLine被找到
    for(int i = 0;我< bytesRead&& !EOF; ($ {
    if(buffer [i] == emptyLine [lastEmptyLineChar]){
    lastEmptyLineChar ++
    eof = lastEmptyLineChar> = emptyLine.length()
    }
    else {
    lastEmptyLineChar = 0
    }

    }

    //改变这个以避免任何编码问题
    actualBuffer<<< ; new String(buffer,0,bytesRead,Charset.forName(US-ASCII))
    }
    return actualBuffer.toString()
    }

    对于POST,您还需要通过查找字符串Content-length:并在此之后解析该值来添加。该值是HTTP正文的大小(即,位于标题标记的/ r / n / r / n后面的位)以八进制数。所以当你遇到头部结尾时,你只需要计算字节数的八进制数,并且你知道POST请求已经完成了传输。



    您还需要确定请求的类型(GET,POST等) - 您可以通过检查在第一个空格之前传输的字符来完成此操作。



    问题



    你的问题是你的 readClientData 函数并不总是读取客户端发送的所有数据。因此,您有时会向服务器发送部分请求,并返回某种错误。如果您替换

      println(new String(buffer))
    code>

    with

      println(可用的)

    readClientData 函数中。 / p>

    为什么会发生这种情况?这是因为available()只告诉你当前可以从InputStream读取什么,而不是客户端是否发送了所有要发送的数据。 InputStream本质上永远不能确定是否会有更多的数据(这是一个例外,如果没有更多的底层数据要读 - 例如套接字已关闭,数组或文件的末尾有已达到等 - 这是仅 时间read()将返回-1(即EOF))。取而代之的是由更高层次的代码来决定是否应该从流中读取更多数据,并根据应用程序特定的规则(这些规则适用于由InputStream读取的特定于应用程序的数据)作出此决定。



    在这种情况下,应用程序是HTTP,因此您需要先理解HTTP协议的基础知识,然后才能使其工作(cmeerw,您处于正确的轨道)。



    当客户端发出HTTP请求时,客户端会打开一个到服务器的套接字并发送请求。客户端由于超时或底层网络连接断开而关闭套接字,或者响应于需要套接字关闭的用户操作(应用程序关闭,页面刷新,停止按钮等)。否则,在发送请求之后,它只是等待服务器发送响应。一旦服务器发送了响应,服务器就会关闭连接[1]。

    在代码成功的地方,数据由客户端快速而持续地提供,这样InputStream在调用 read()和随后在下一次迭代中调用 available()之间接收​​额外的数据(请记住 c $ c> InputStream )与你的代码并行地提供了数据,该代码调用它的 read()方法)。在另一种情况下,如果代码失败,没有数据提供给 InputStream ,所以当你的代码调用 available() InputStream 正确返回0,因为您调用 read()因此它有0个字节可供您使用 read()。这是Johnathan谈论的竞争条件。



    您的代码假定当 available()返回0时,所有数据已经被客户发送,事实上,有时它有,有时它没有(有时你会得到好的请求,而其他时间则不会)。因此,您需要比 available()更好的东西来确定客户端是否发送了所有数据。



    在调用 read()时(参见R4an的回答[2]),检查EOF也不合适。应该清楚为什么会出现这种情况 - 只有在套接字关闭时, read()应该返回EOF(-1)。在您将请求转发给目标代理,收到响应并将响应发送给客户端之前,不应发生这种情况,但我们知道客户端也可以异常关闭该响应。实际上,当您运行示例代码时,您会看到这种行为 - 代理会挂起,直到浏览器中单击停止按钮,导致客户端过早关闭连接。



    <现在你知道的正确答案是做一些HTTP解析并用它来确定连接的状态。



    注释

    [1]它超出了概念代理的证明范围,但由于已经触及它,如果HTTP连接是保持活动的,服务器将保持连接打开并等待另一个来自客户端的请求

    [2]在这段代码中有一个错误导致readClientData破坏数据:

      byte [] buffer = new byte [16 * 1024]; ((bytesRead = inStream.read(buffer))> = 0){// -1在EOF 
    上def bytesRead = inStream.read(buffer,0,bytesRead);
    actualBuffer<<新字符串(缓冲区)
    }

    第二个 inStream.read ()调用完全覆盖了第一次调用 inStream.read()时读取的数据。另外bytesRead在这里被重新定义(对Groovy不太熟悉,不知道这是否是错误)。这行应该是:

      bytesRead = bytesRead + inStream.read(buffer,bytesRead,buffer.length() -  bytesRead) ; 

    或完全移除。

    I want to write a web proxy for exercise, and this is the code I have so far:

    
    // returns a map that contains the port and the host
    def parseHostAndPort(String data) {
        def objMap // this has host and port as keys
        data.eachLine { line ->
            if(line =~ /^(?i)get|put|post|head|trace|delete/) {
                println line
                def components = line.split(" ")
                def resource = components[1]
                def colon = resource.indexOf(":")
                if(colon != -1) {
                    URL u = new URL(resource)
                    def pHost = u.host
                    def pPort = u.port
                    return (objMap = [host:pHost,port:pPort])
                }
                else {
                    return (objMap = [host:resource,port:80])
                }
            }
        }
        return objMap
    }
    
    // reads a http request from a client
    def readClientData(Socket clientSocket) {
        def actualBuffer = new StringBuilder()
        InputStream inStream = clientSocket.inputStream
        while(true) {
            def available = inStream.available()
            if(available == 0)
            break;
            println "available data $available"
            def buffer = new byte[available]
            def bytesRead = inStream.read(buffer,0,available)
            actualBuffer << new String(buffer)
        }
        return actualBuffer.toString()
    }
    
    def sock = new ServerSocket(9000)
    sock.reuseAddress = true
    while(true) {
        sock.accept { cli ->
            println "got a client"
            def data = readClientData(cli)
            def parsed = parseHostAndPort(data)
            def host = parsed["host"]
            def port = parsed["port"]
    
            println "got from client $data"
    
            def nsock = new Socket(host,port)
            nsock << data // send data received from client to the socket
            nsock.outputStream.flush() 
            def datax = readClientData(nsock)
            println "got back $datax"
            cli << datax // send the client the response
            cli.outputStream.flush()
            cli.close()
        }
    }
    
    

    Right now, all it does is :

    • read the HTTP request my browser sends

    • parse the host and port

    • connect to that host, and write the data received from the client

    • send the client back the data received from the host

    But ... it doesn't work all the time. Sometimes it will make a good request, sometimes not. I think it's a buffering issue, I'm not sure. The thing is, I added flush calls, and still nothing.

    Can you spot what I'm doing wrong?

    EDIT:

    • I noticed that if I add some sleep calls, the proxy seems to "work" on a higher number of requests, but not all of them.
    • to collect the bounty, help me find out what I'm doing wrong. What's the normal "algorithm" used for a web proxy? Where am I deviating from it? Thanks!

    解决方案

    First, it's really difficult to know what exactly is going wrong here - "Sometimes it will make a good request, sometimes not." doesn't really describe what's happening when the problem occurs!!

    That said, I was still able to figure out what's going wrong for you.

    As you've said already, you're looking for the most basic solution that'll work consistently, so I'll avoid anything unnecessary or getting into the efficiency or otherwise of your code. Also, I'll give you the answer first and then describe what's causing the problem (it's long, but worth reading :)

    Solution

    The simple answer to your problem is that you need to do some HTTP protocol parsing to figure out if all of the data has been sent by the client and not rely on what available() or read() return. How much of a PITA this is depends on how completely you wish to support the HTTP protocol. To support GET requests, it's pretty easy. It's a little harder to support POSTs that specify a content length. It's much harder to support "other" encoding types (e.g. chunked or multipart/byteranges see http://tools.ietf.org/html/rfc2616#section-4.4).

    Anyway, I assume you're just trying to get GETs working, so to do that, you have to know that HTTP headers and bodys are separated by an "empty line", that HTTP's line delimeter is \r\n and that GETs do not have a body. Therefore a client has finished sending a GET request when it transmits \r\n\r\n.

    Some code like this should handle GETs consistently for you (code is untested but it should get you to at least 90%):

    def readClientData(Socket clientSocket) {
    
        def actualBuffer = new StringBuilder()
        def eof = false;
    
        def emptyLine = ['\r', '\n', '\r', '\n']
        def lastEmptyLineChar = 0
    
        InputStream inStream = clientSocket.inputStream
        while(!eof) {
            def available = inStream.available()
            println "available data $available"
    
            // try to read all available bytes
            def buffer = new byte[available]
            def bytesRead = inStream.read(buffer,0,available)
    
            // check for empty line: 
            //    * iterate through the buffer until the first element of emptyLine is found
            //    * continue iterating through buffer checking subsequent elements of buffer with emptyLine while consecutive elements match
            //    * if any element in buffer and emptyLine do not match, start looking for the first element of emptyLine again as the iteration through buffer continues
            //    * if the end of emptyLine is reached and matches with buffer, then the emptyLine has been found
            for( int i=0; i < bytesRead && !eof; i++ ) {
                if( buffer[i] == emptyLine[lastEmptyLineChar] ){
                    lastEmptyLineChar++
                    eof = lastEmptyLineChar >= emptyLine.length()
                }
                else {
                    lastEmptyLineChar = 0
                }
    
            }
    
            // changed this so that you avoid any encoding issues
            actualBuffer << new String(buffer, 0, bytesRead, Charset.forName("US-ASCII"))
        }
        return actualBuffer.toString()
    }
    

    For POSTs, you need to add to this by also looking for the String "Content-length: " and parsing the value after this. This value is the size of the HTTP body (i.e. the bit that comes after the /r/n/r/n end of header mark) in octals. So when you encounter the end of header, you just need to count that number of octals of bytes and you know that the POST request has completed transmission.

    You'll also need to determine the type of request (GET, POST etc.) - you can do this by inspecting the characters transmitted before the first space.

    Problem

    Your problem is that your readClientData function doesn't always read all of the data sent by the client. As a result, you're sometimes sending a partial request to the server and the returns some kind of error. You should see incomplete requests printed to standard out if you replace

    println(new String(buffer))
    

    with

    println(avaliable)
    

    in the readClientData function.

    Why is this happening? It's because available() only tells you what's currently available to be read from the InputStream and not whether or not the client has sent all the data it's going to send. An InputStream, by it's very nature, can never actually tell whether or not there will be more data (the exception to this is if there is no more underlying data to read - e.g. a socket is closed, the end of the array or file has been reached, etc. - this is the only time read() will return -1 (i.e. EOF)). Instead it's up to higher level code to decide whether it should read more data from the stream and it makes this decision based on application-specific rules that apply to the application-specific data being read by the InputStream.

    In this case, the application is HTTP, so you need to understand the basics of the HTTP protocol before you'll get this working (cmeerw, you were on the right track).

    When a HTTP request is made by a client, the client opens a socket to the server and sends a request. The client only closes the socket as a result of a timeout, or the underlying network connection being disconnected, or in response to user action that requires that the socket is closed (application is closed, page is refreshed, stop button pushed etc.). Otherwise, after sending the request, it just waits for the server to send a response. Once the server has sent the response, the server closes the connection [1].

    Where your code succeeds, data is being provided by the client quickly and consistently enough so that the InputStream receives additional data between your invocation of read() and your subsequent invocation of available() on the next iteration of the loop (remember that InputStream is being provided with data "in parallel" to your code that's invoking its read() method). Now in the other case, where your code fails, no data has yet been provided to InputStream, so when your code invokes available(), InputStream correctly returns 0 since no further data has been provided to it since you invoked read() and therefore it has 0 bytes available for you to read(). This is the race condition that Johnathan's talking about.

    Your code assumes that when available() returns 0 that all data has been sent by the client when, in fact, sometimes it has, and sometimes it has not (so sometimes you get a "good request" and other times not :).

    So you need something better than available() to determine wheter or not the client has sent all of the data.

    Checking for EOF when you invoke read() (see R4an's answer [2]) isn't suitable either. It should be clear why this is the case - the only time read() is supposed to return EOF (-1) is when the socket is closed. This isn't supposed to happen until you've forwarded the request to the target proxy, received a response and sent that response to the client, but we know it can also exceptionally be closed by the client. In fact you're seeing this behaviour when you run the sample code - the proxy hangs until the stop button is clicked in the browser, causing the client to close the connection prematurely.

    The correct answer, which you now know, is to do some parsing of the HTTP and use that to determine the state of the connection.

    Notes
    [1] It's beyond a proof of concept proxy, but since it was touched on already, if the HTTP connection is "keep-alive" the server will keep the connection open and wait on another request from the client
    [2] There's an error in this code that causes the readClientData mangle the data:

    byte[] buffer = new byte[16 * 1024];
    while((bytesRead = inStream.read(buffer)) >= 0) { // -1 on EOF
        def bytesRead = inStream.read(buffer,0,bytesRead); 
        actualBuffer << new String(buffer)
    }
    

    The second inStream.read() invocation completely overwrites the data read by the first invocation of inStream.read(). Also bytesRead is being redefined here (not familiar enough with Groovy to know whether or not this would be an error). This line should either read:

    bytesRead = bytesRead + inStream.read(buffer,bytesRead,buffer.length()-bytesRead);
    

    or be removed entirely.

    这篇关于请帮我弄清楚这个网络代理代码有什么问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆