May 15, 2019

TCP socket send buffer deep dive




A typical TCP socket send buffer is composed of three parts: unacked-bytes, unsent-bytes, and free-buffer.
                               +----------------+
                               |                |
                               |                |
                               |  FREE BUFFER   |
                               |                |
                               |                |
                               +----------------+
                               |                |
                               |  UNSENT BYTES  |
                               |                |
                               +----------------+
                               |                |
                               |  UNACKED BYTES |
                               |                |
                               +----------------+

Total send buffer size

Total send buffer size = unacked-bytes + unsent-bytes + free-buffer.  
It can be obtained using the SO_SNDBUF socket option. The buffer size could dynamically change its size as seen needed by the OS. This works for both Linux and macOS.
        slen = sizeof(sndbufsiz);
        err = getsockopt(sd, SOL_SOCKET, SO_SNDBUF, &sndbufsiz, &slen);

Total in-flight bytes

Total inflight bytes = unacked-bytes + unsent-bytes. 
It can be obtained using SO_NWRITE socket option on macOS, and SIOCOUTQ ioctl on Linux.
int get_socket_used(int sd){
    int err;
    int used;
#ifdef __APPLE__
    socklen_t slen = sizeof(used);
    err = getsockopt(sd, SOL_SOCKET, SO_NWRITE, &used,&slen);
    if(err < 0) {
        perror("getsockopt 2");
        exit(1);
    }
#else
    err = ioctl(sd, SIOCOUTQ, &used);
    if(err < 0) {
        perror("ioctl SIOCOUTQ");
        exit(1);
    }
#endif
    return used;
}
On macOS, it seems that this can also be obtained using the TCP_INFO struct, but it is a private API.
u_int32_t       tcpi_snd_sbbytes;       /* bytes in snd buffer including data inflight */

unacked-bytes

On Linux, unacked-bytes can be obtained from the TCP_INFO structure, but the result is number of segments, or bytes. On macOS, TCP_INFO seems to contain this infomration (private API).
int get_socket_unacked(int sd){
    struct tcp_info tcp_info;
    socklen_t tcp_info_length = sizeof(tcp_info);
    if ( getsockopt(sd, IPPROTO_TCP, TCP_INFO, (void *)&tcp_info, &tcp_info_length ) == 0 ) {
        return tcp_info.tcpi_unacked;
    }
    return 0;
}

//For macOS, use TCP_INFO
    u_int64_t       tcpi_txunacked __attribute__((aligned(8)));    /* current number of bytes not acknowledged */
macOS tcp_info definition

unsent-bytes (not including un-acked bytes)

On Linux, unsent-bytes can be obtained from the tcpi_notsent_bytes field of the TCP_INFO structure. NOTE that this requires kernel version to be 4.6 or newer. For Android that means Android 8 or newer. On Linux, it can also be obtained using SIOCOUTQNSD ioctl. It’s not clear how to do this on macOS.
//defined in /usr/include/linux/sockios.h 
int get_socket_unsent(int sd){
    int err;
    int unsent;
    err = ioctl(sd, SIOCOUTQNSD, &unsent);
    if(err < 0) {
        perror("ioctl SIOCOUTQNSD");
        exit(1);
    }
    return unsent;
}

//OR 

int get_socket_unsent(int sd){
    struct tcp_info tcp_info;
    socklen_t tcp_info_length = sizeof(tcp_info);
    if ( getsockopt(sd, IPPROTO_TCP, TCP_INFO, (void *)&tcp_info, &tcp_info_length ) == 0 ) {
        return tcp_info.tcpi_notsent_bytes;
    }
    return 0;
}

Stackoverflow discussion on getting unsent bytes

epoll and kevent

epoll on Linux, and kevent on macOS, get triggered by the unsent-bytes, when the TCP option TCP_NOTSENT_LOWAT is set. On macOS, kevent() doesn’t report socket as writable until the unsent TCP data drops below specified threshold (typically 8 kilobytes).

No comments:

Post a Comment