Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message for non compliant http servers missing \r\n (was Read error reported by fortio, but not curl/wrk) #913

Closed
lbrndnr opened this issue Apr 3, 2024 · 7 comments · Fixed by #914

Comments

@lbrndnr
Copy link

lbrndnr commented Apr 3, 2024

Hi,

I noticed that fortio reports lots of read errors when benchmarking a project of mine. Since I couldn't find the cause in my code, I double checked using curl and wrk. Both tools work fine, i.e. no read errors are reported. I threw together a little server written in C that makes the problem apparent. Here are some of the results:

curl:

$ curl -vv 127.0.0.1:3000/hi -: 127.0.0.1:3000/hi
*   Trying 127.0.0.1:3000...
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /hi HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: webserver-c
< Content-Length: 26
< Content-type: text/html
< 
<html>hello, world</html>
* Connection #0 to host 127.0.0.1 left intact
* Found bundle for host 127.0.0.1: 0x563b0359dc30 [serially]
* Can not multiplex, even if we wanted to!
* Re-using existing connection! (#0) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /hi HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: webserver-c
< Content-Length: 26
< Content-type: text/html
< 
<html>hello, world</html>
* Connection #0 to host 127.0.0.1 left intact

wrk:

$  ./wrk -c 1 -t 1 -d 1s http://127.0.0.1:3000/hi
Running 1s test @ http://127.0.0.1:3000/hi
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   335.15us    1.42ms  17.29ms   96.98%
    Req/Sec    11.32k     1.83k   14.09k    60.00%
  11247 requests in 1.00s, 1.14MB read
Requests/sec:  11243.32
Transfer/sec:      1.14MB

But for fortio I get:

$ fortio load -n 1 -c 1 http://127.0.0.1:3000/hi
15:07:22.836 r1 [INF] scli.go:125> Starting, command="Φορτίο", version="1.63.5 h1:XUxnT4F1vJ1EXLFAn0AYwseH0UUhr+Ti1+4REJo/faI= go1.21.5 amd64 linux", go-max-procs=8
Fortio 1.63.5 running at 8 queries per second, 8->8 procs, for 1 calls: http://127.0.0.1:3000/hi
15:07:22.837 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://127.0.0.1:3000/hi", threads=1, qps="8.0", warmup="parallel", conn-reuse=""
15:07:22.842 r1 [WRN] periodic.go:440> Increasing the number of calls to the minimum of 2 with 1 thread. total duration will increase
Starting at 8 qps with 1 thread(s) [gomax 8] : exactly 1, 2 calls each (total 2 + -1)
15:07:25.846 r1 [ERR] http_client.go:1091> Read error, err={"Op":"read","Net":"tcp","Source":{"IP":"127.0.0.1","Port":37982,"Zone":""},"Addr":{"IP":"127.0.0.1","Port":3000,"Zone":""},"Err":{}}, size=106, dest={"IP":"127.0.0.1","Port":3000,"Zone":""}, url="http://127.0.0.1:3000/hi", thread=0, run=0
15:07:25.846 r1 [INF] periodic.go:851> T000 ended after 3.004239508s : 1 calls. qps=0.33286294163201585
Ended after 3.004333048s : 1 calls. qps=0.33285
15:07:25.846 r1 [INF] periodic.go:581> Run ended, run=0, elapsed=3004333048, calls=1, qps=0.3328525779342957
Sleep times : count 0 avg 0 +/- 0 min 0 max 0 sum 0
Aggregated Function Time : count 1 avg 3.0042336 +/- 0 min 3.004233618 max 3.004233618 sum 3.00423362
# range, mid point, percentile, count
>= 3.00423 <= 3.00423 , 3.00423 , 100.00, 1
# target 50% 3.00423
# target 75% 3.00423
# target 90% 3.00423
# target 99% 3.00423
# target 99.9% 3.00423
Error cases : count 1 avg 3.0042336 +/- 0 min 3.004233618 max 3.004233618 sum 3.00423362
# range, mid point, percentile, count
>= 3.00423 <= 3.00423 , 3.00423 , 100.00, 1
# target 50% 3.00423
# target 75% 3.00423
# target 90% 3.00423
# target 99% 3.00423
# target 99.9% 3.00423
# Socket and IP used for each connection:
[0]   1 socket used, resolved to 127.0.0.1:3000, connection timing : count 1 avg 0.00068778 +/- 0 min 0.00068778 max 0.00068778 sum 0.00068778
Connection time histogram (s) : count 1 avg 0.00068778 +/- 0 min 0.00068778 max 0.00068778 sum 0.00068778
# range, mid point, percentile, count
>= 0.00068778 <= 0.00068778 , 0.00068778 , 100.00, 1
# target 50% 0.00068778
# target 75% 0.00068778
# target 90% 0.00068778
# target 99% 0.00068778
# target 99.9% 0.00068778
Sockets used: 1 (for perfect keepalive, would be 1)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
127.0.0.1:3000: 1
Code  -1 : 1 (100.0 %)
Response Header Sizes : count 1 avg 0 +/- 0 min 0 max 0 sum 0
Response Body/Total Sizes : count 1 avg 106 +/- 0 min 106 max 106 sum 106
All done 1 calls (plus 0 warmup) 3004.234 ms avg, 0.3 qps

basic http server source code (note that it doesn't close any connections, so it caps out at 1000 conns):

#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <errno.h>
#include <linux/tcp.h>
#include <poll.h>
#include <stdio.h>
#include <unistd.h>

const int MAX_NUM_CONN = 1000;

void net_addr_from_name(struct sockaddr_storage *ss, const char *host) {
    struct sockaddr_in *sin = (struct sockaddr_in *)ss;
    struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)ss;

    if (inet_pton(AF_INET, host, &sin->sin_addr) == 1) {
        sin->sin_family = AF_INET;
        return;
    }

    if (inet_pton(AF_INET6, host, &sin6->sin6_addr) == 1) {
        sin6->sin6_family = AF_INET6;
        return;
    }

    printf("inet_pton(%s)\n", host);
}

int net_bind_tcp(struct sockaddr_storage *ss) {
    int sd = socket(ss->ss_family, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP);
    if (sd < 0) {
        printf("socket()\n");
        return -1;
    }

    int one = 1;
    int r = setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one));
    if (r < 0) {
        printf("setsockopt(SO_REUSEADDR)\n");
        return -1;
    }

    r = bind(sd, (struct sockaddr *)ss, sizeof(struct sockaddr_storage));
    if (r < 0) {
        printf("bind()\n");
        return -1;
    }

    listen(sd, 1024);
    return sd;
}

int net_parse_sockaddr(struct sockaddr_storage *ss, const char *addr) {
    memset(ss, 0, sizeof(struct sockaddr_storage));

    char *colon = strrchr(addr, ':');
    if (colon == NULL || colon[1] == '\0') {
        printf("%s doesn't contain a port number.\n", addr);
        return -1;
    }

    char *endptr;
    long port = strtol(&colon[1], &endptr, 10);
    if (port < 0 || port > 65535 || *endptr != '\0') {
        printf("Invalid port number %s\n", &colon[1]);
        return -1;
    }

    char host[255];
    int addr_len = colon - addr > 254 ? 254 : colon - addr;
    strncpy(host, addr, addr_len);
    host[addr_len] = '\0';
    net_addr_from_name(ss, host);

    struct sockaddr_in *sin = (struct sockaddr_in *)ss;
    struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)ss;

    switch (ss->ss_family) {
    case AF_INET:
        sin->sin_port = htons(port);
        break;
    case AF_INET6:
        sin6->sin6_port = htons(port);
        break;
    default:
        printf("\n");
    }
    return -1;
}

int main(int argc, char **argv) {
    int err;

    if (argc < 2) {
        fprintf(stderr,
                "Usage: %s <listen:port>\n",
                argv[0]);
        exit(1);
    }

    struct sockaddr_storage listen;
    net_parse_sockaddr(&listen, argv[1]);

    unsigned int num_fds = 0;
    struct pollfd *fds = (struct pollfd *)calloc(MAX_NUM_CONN, sizeof(struct pollfd));
    if (fds == NULL) {
        fprintf(stderr, "malloc failed\n");
        goto cleanup;
    }

    // start listening to incomming connections
    int lfd = net_bind_tcp(&listen);
    if (lfd < 0) {
        fprintf(stderr, "Bind failed\n");
        goto cleanup;
    }

accept:
    if (num_fds == MAX_NUM_CONN) {
        printf("No new connections are accepted. Polling...");
        goto poll;
    }

    struct sockaddr_storage client;
    socklen_t client_len = sizeof(struct sockaddr_storage);
    int fd = accept(lfd, (struct sockaddr *)&client, &client_len);
    if (fd < 0) {
        // no new connections
        // due to an error?
        if (errno == EAGAIN) {
            // just no new connection request poll again
            goto poll;
        }

        printf("Error accepting new connections: %s\n", strerror(errno));
        goto cleanup;
    }

    int on = 1;
    int r = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on));
    if (r < 0) {
        printf("setsockopt(SO_KEEPALIVE)\n");
        goto cleanup;
    }

    struct sockaddr_in client_addr;
    int len = sizeof(client_addr);
    if (getpeername(fd, (struct sockaddr *)&client_addr, (socklen_t*)&len) < 0) {
        printf("Error getting peer name: %s\n", strerror(errno));
        goto cleanup;
    }

    printf("New connection accepted: [%s:%d] socket: %d\n", inet_ntoa(client_addr.sin_addr), htons(client_addr.sin_port), fd);

    fds[num_fds].fd = fd;
    fds[num_fds].events = POLLIN|POLLHUP|POLLERR;
    num_fds++;

poll:
    // printf("Polling...\n");
    int nfds = poll(fds, num_fds, 0.1);

    if (nfds == -1) {
        fprintf(stderr, "Error in poll syscall\n");
        goto cleanup;
    }
    else if (nfds == 0) {
        goto accept;
    }

    struct pollfd *new_fds = (struct pollfd *)calloc(MAX_NUM_CONN, sizeof(struct pollfd));
    unsigned int new_num_fds = 0;

    for (int i = 0; i < num_fds; i++) {
        if (fds[i].revents & POLLIN) {
            size_t buf_len = 1024;
            char buf[buf_len];
            memset(buf, 0, buf_len);

            // read the entire request
            ssize_t len = recv(fds[i].fd, buf, buf_len, 0);
            size_t req_len = strlen(buf);

            // EAGAIN means we have to wait for eBPF verdict first
            if (len < 0 && errno != EAGAIN) {
                printf("Error reading socket: %s\n", strerror(errno));
                goto cleanup;
            }
            
            // if we couldn't read yet, but also don't have an error
            // we will just try again :)
            if (len > 0) {
                printf("Received request length %ld after reading %ld: %s\n", req_len, len, buf);

                char resp[] = "HTTP/1.1 200 OK\n"
                  "Server: webserver-c\n"
                  "Content-Length: 26\n"
                  "Content-type: text/html\n\n"
                  "<html>hello, world</html>\n";
                memset(buf, 0, buf_len);
                strcpy(buf, resp);

                req_len = strlen(resp);
                size_t res_len = 0;
                do {
                    len = send(fds[i].fd, buf+res_len, req_len-res_len, 0);
                    if (len < 0) {
                        // the socket might not be ready
                        continue;
                    }
                    
                    if (len == 0) {
                        printf("message sent\n");
                        // we sent the msg
                        break;
                    }
                    printf("len: %ld\n", len);
                    res_len += len;
                } while (res_len < req_len);
                printf("Response sent\n");
            }
        }

        if (fds[i].revents & POLLHUP || fds[i].revents & POLLERR) {
            printf("Closing socket %d\n", fds[i].fd);
            close(fds[i].fd);
        }
        else {
            // we want to poll this fd again
            new_fds[new_num_fds] = fds[i];
            new_num_fds++;
        }
    }

    memset(fds, 0, num_fds);
    for (int i = 0; i < new_num_fds; i++) {
        fds[i] = new_fds[i];
        fds[i].events = POLLIN|POLLHUP|POLLERR;
    }
    num_fds = new_num_fds;
    free(new_fds);

    // printf("Got %d open sockets\n", num_fds);
    goto accept;

cleanup:
    for (int i = 0; i < num_fds; i++) {
        close(fds[i].fd);
    }
    free(fds);

    return -err;
}

You can compile and run it as follows:

gcc main.c
./a.out 0.0.0.0:3000

I firs thought that I do something wrong while handling the sockets, but since curl and wrk both seem to work fine, I'm not sure if fortio might be the problem? Either way, I'd really appreciate your thoughts on this :)
Cheers!

@lbrndnr lbrndnr changed the title Read error when reusing connection Read error reported by fortio, but not curl/wrk Apr 3, 2024
@ldemailly
Copy link
Member

Thanks, interesting, will try - can you check whether you get the same behavior with and without -stdclient ?

@ldemailly
Copy link
Member

well reading the code of the server: one issue is your server isn't http protocol compliant:

"Server: webserver-c\n"

should be \r\n for headers

but looks like curl is lenient about that, let me see if I can make fortio just warn about it

@lbrndnr
Copy link
Author

lbrndnr commented Apr 3, 2024

You're right, that actually fixes it! -stdclient also makes it work!

@lbrndnr
Copy link
Author

lbrndnr commented Apr 3, 2024

Interestingly, when running with -stdclient and multiple requests -n 10, I get the following warning:

$ fortio load -n 10 -c 1 -stdclient http://127.0.0.1:3000/hi
16:47:26.071 r1 [INF] scli.go:125> Starting, command="Φορτίο", version="1.63.5 h1:XUxnT4F1vJ1EXLFAn0AYwseH0UUhr+Ti1+4REJo/faI= go1.21.5 amd64 linux", go-max-procs=8
Fortio 1.63.5 running at 8 queries per second, 8->8 procs, for 10 calls: http://127.0.0.1:3000/hi
16:47:26.072 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://127.0.0.1:3000/hi", threads=1, qps="8.0", warmup="parallel", conn-reuse=""
Starting at 8 qps with 1 thread(s) [gomax 8] : exactly 10, 10 calls each (total 10 + 0)
16:47:26.073 r34 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.212 r21 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.352 r38 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.490 r52 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.628 r42 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.768 r56 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:26.906 r46 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:27.046 r60 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:27.184 r62 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:27.323 r66 [WRN]> Unsolicited response received on idle HTTP channel starting with "\n"; err=<nil>, src="std"
16:47:27.323 r1 [INF] periodic.go:851> T000 ended after 1.251544663s : 10 calls. qps=7.990126357959628
Ended after 1.251628023s : 10 calls. qps=7.9896
16:47:27.323 r1 [INF] periodic.go:581> Run ended, run=0, elapsed=1251628023, calls=10, qps=7.989594205498227
Sleep times : count 9 avg 0.13722148 +/- 0.0003369 min 0.136759027 max 0.137654485 sum 1.23499336
Aggregated Function Time : count 10 avg 0.0011052665 +/- 0.0001827 min 0.00082008 max 0.001457551 sum 0.011052665
# range, mid point, percentile, count
>= 0.00082008 <= 0.001 , 0.00091004 , 20.00, 2
> 0.001 <= 0.00145755 , 0.00122878 , 100.00, 8
# target 50% 0.00117158
# target 75% 0.00131457
# target 90% 0.00140036
# target 99% 0.00145183
# target 99.9% 0.00145698
Error cases : no data
# Socket and IP used for each connection:
[0]  10 socket used, resolved to 127.0.0.1:3000, connection timing : count 10 avg 0.0003587242 +/- 8.997e-05 min 0.00021883 max 0.00058076 sum 0.003587242
Connection time histogram (s) : count 10 avg 0.0003587242 +/- 8.997e-05 min 0.00021883 max 0.00058076 sum 0.003587242
# range, mid point, percentile, count
>= 0.00021883 <= 0.00058076 , 0.000399795 , 100.00, 10
# target 50% 0.000379688
# target 75% 0.000480224
# target 90% 0.000540546
# target 99% 0.000576739
# target 99.9% 0.000580358
Sockets used: 10 (for perfect keepalive, would be 1)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
127.0.0.1:3000: 10
Code 200 : 10 (100.0 %)
Response Header Sizes : count 10 avg 0 +/- 0 min 0 max 0 sum 0
Response Body/Total Sizes : count 10 avg 26 +/- 0 min 26 max 26 sum 260
All done 10 calls (plus 0 warmup) 1.105 ms avg, 8.0 qps

This does not happen when leaving -stdclient out.

@ldemailly ldemailly changed the title Read error reported by fortio, but not curl/wrk Improve error message for non compliant http servers missing \r\n (was Read error reported by fortio, but not curl/wrk) Apr 3, 2024
@ldemailly
Copy link
Member

I simplified the bad server code to:

package main

import (
	"fmt"
	"log"
	"net"
	"os"
)

const maxNumConn = 1000

func main() {
	if len(os.Args) < 2 {
		fmt.Fprintf(os.Stderr, "Usage: %s <listen:port>\n", os.Args[0])
		os.Exit(1)
	}

	laddr, err := net.ResolveTCPAddr("tcp", os.Args[1])
	if err != nil {
		log.Fatalf("Failed to resolve address: %v", err)
	}

	listener, err := net.ListenTCP("tcp", laddr)
	if err != nil {
		log.Fatalf("Failed to listen on %s: %v", laddr, err)
	}
	defer listener.Close()

	fmt.Printf("Listening on %s\n", listener.Addr())

	for {
		conn, err := listener.AcceptTCP()
		if err != nil {
			log.Printf("Failed to accept connection: %v", err)
			continue
		}

		fmt.Printf("New connection accepted: %s\n", conn.RemoteAddr())

		go handleConnection(conn)
	}
}

func handleConnection(conn *net.TCPConn) {
	defer conn.Close()
	buf := make([]byte, 1024)
	for {
		read, err := conn.Read(buf)
		if err != nil {
			log.Printf("Error reading from connection: %v", err)
			return
		}
		fmt.Printf("Received request: %s\n", string(buf[:read]))

		// \r missing on purpose
		response := "HTTP/1.1 200 OK\n" +
			"Server: webserver-go\n" +
			"Content-Length: 26\n" +
			"Content-type: text/html\n\n" +
			"<html>hello, world</html>\n"

		n, err := conn.Write([]byte(response))
		if err != nil {
			log.Printf("Error sending response: %v", err)
			return
		}
		fmt.Printf("Response sent %d bytes\n", n)
	}
}

with which the error reproduces, including explanation when in debug logging:

$ fortio curl -loglevel debug http://localhost:3000/foo
[...]
10:01:00.407 [DBG] [0] Read ok 107 total 107 so far (-0 headers = 107 data) HTTP/1.1 200 OK\nServer: webserver-go\nContent-Length: 26\nContent-type: text/html\n\n<html>hello, world</html>\n
10:01:00.407 [DBG] [0] Code 200, looking for end of headers at 12 / 107, last CRLF 0
10:01:03.409 [ERR] Read error, err={"Op":"read","Net":"tcp","Source":{"IP":"127.0.0.1","Port":59061,"Zone":""},"Addr":{"IP":"127.0.0.1","Port":3000,"Zone":""},"Err":{}}, size=107, dest={"IP":"127.0.0.1","Port":3000,"Zone":""}, url="http://localhost:3000/foo", thread=0, run=0

the key there is looking for end of headers last CRLF 0

which could be in the error printout

@lbrndnr
Copy link
Author

lbrndnr commented Apr 4, 2024

Thanks for the quick fix! I have also noticed that fortio fails to reuse an open socket, something that wrk, curl and k6 seem to get right. I will come back to you if I can reproduce the problem.

@ldemailly
Copy link
Member

fortio does reuse sockets unless you tell it not to or the server closes them.

I'd like to see the output when/if it does. It does tell you the exact number of sockets used at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants