November 6, 2019

AWS API Gateway can be expensive

As of 2019/11, AWS charges $0.25 per million connection minutes. That sounds very innocent, but is it?

If your business grows and you have up to 1 million devices that needs to keep connected to an AWS Gateway, the cost, every month, just for the connection fee would be:

$0.25*1440*30*1M/1M = $10,800 per month

In this case, a highly available cluster may be a lot more cost effective.

October 30, 2019

AWS VPC: Private Subnet vs Public Subnet

In AWS, a private subnet is defined as a subnet that doesn’t have a direct route to the Internet. It is only accessible from within the subnet, e.g. web server accessing internal database servers. If instances inside the private subnet needs to go out to the Internet, for example, updating packages, it will need to hop through a machine inside the public subnet. This can be done by a NAT instance (a dedicated instance just for this purpose) or by a NAT gateway ( a managed service provided by AWS).
Public Subnet Private Subnet
Instances have public IP addresses Instances do not have public IP address (selectable during instance creation)
Route table attached to the subnet has a default route Route table attached to the subnet has no default route

Internet Gateway

AWS Internet Gateway is a one-to-one private IP to public IP NAT device/service, not a regular NAT device like a home router. Instances inside the VPC subnet needs to have a public IP address associated with it. If not, Internet Gateway would not be able to route traffic for it.

Egress-only Internet Gateway

IPv6 Only

NAT gateway or NAT Instance

IPv4 Only

Internet Access for Default and Nondefault VPCs

The following table provides an overview of whether your VPC automatically comes with the components required for internet access over IPv4 or IPv6.

Component Default VPC Nondefault VPC
Internet gateway Yes Yes, if you created the VPC using the first or second option in the VPC wizard. Otherwise, you must manually create and attach the internet gateway.
Route table with route to internet gateway for IPv4 traffic ( Yes Yes, if you created the VPC using the first or second option in the VPC wizard. Otherwise, you must manually create the route table and add the route.
Route table with route to internet gateway for IPv6 traffic (::/0) No Yes, if you created the VPC using the first or second option in the VPC wizard, and if you specified the option to associate an IPv6 CIDR block with the VPC. Otherwise, you must manually create the route table and add the route.
Public IPv4 address automatically assigned to instance launched into subnet Yes (default subnet) No (nondefault subnet)
IPv6 address automatically assigned to instance launched into subnet No (default subnet) No (nondefault subnet)

October 17, 2019

R ggplot2 geom_bar order

In R, to plot a pie chart using a dataframe as the data source, one would use geom_bar first to plot a bar graph, and then use coord_polar to convert it to a pie chart. It works, but there is a catch:
  • geom_bar sorts the fill value alphabetically and always put the lowest value one on the top of the bar graph. For example, “apple” would be on the top of “pear”. This becomes a problem if you intend to label the pie chart.
  • The solution is to sort the dataframe in decreasing order by the fill variable. This way all calculations will be right.

Example R code with correct labels.


df$Label <- paste(df$val, paste(round((df$Freq/sum(df$Freq))*100,0),"%",sep=""), sep="-")
p <- ggplot(df,aes(x=1,y=Freq,fill=val))+geom_bar(stat="identity", color = "black")
p1 <- p + coord_polar(theta='y') + theme(axis.ticks=element_blank(),
p2 <- p1 + scale_y_continuous(labels= df$Label,breaks=cumsum(df$Freq) - df$Freq/ 2)
p3 <- p2 + labs(title=title,fill=legend_title)+theme(plot.title = element_text(hjust = 0.5))

Update on 10/23/2019

This article explains the order issue very clearly:

In short, the rule for the sort order is:
  • if factor, the order of factor levels is used
  • if character, an alphabetical order is used
So one can change the factor order to change the plot order.

October 15, 2019


jq is a very useful tool to process JSON.

The idea of jq is centered on the idea of pipe and data stream.

Note below that a JSON entity is a self-contained and complete JSON object/array.


  • When jq is invoked, the input JSON file is parsed as one or multiple JSON entities.
    • The input file can have multiple JSON entities, separated by new lines or spaces. One stream is created for each JSON entity.
    • If multiple files are supplied to jq, each file should contain one or more self-contained JSON entities.
    • To combine all the multiple input JSON entities into one big JSON entity, pass the “–slurp” flag to jq. It will automatically create a containing JSON array and contain all input JSON entities into it. This includes the case of multiple input JSON entities in one input file, or multiple input files.
  • In short, jq is started with one or multiple data streams which are from the input JSON entities.
  • Each stream is then passed through the specified filter independently.
  • One can pass the -c (compact) flag to jq to print each stream as one line.
  • Some key concepts of the filter.
    • . means the entire input data stream
    • .[] moves into the JSON object/array one layer, and splits the data stream into multiple streams, one for each entry in the object or array. For example. {"a":1,"b":2} will broken into two data streams after passing this filter, becoming 1 and 2 respectively. Note that they are the values of the entires and are not valid JSON anymore, which is okay but needs to be noted. The same streaming-splitting will also happen for JSON arrays after passing this filter.
    • .["key"] generates one stream from the input stream. The shorthand is .key;
    • , can be used to manually generate multiple streams.
    • To collect multiple streams into one stream, one can use [ ] to generate an array, or use { } to generate an object. Note that to generate an object, one has to manually specify the keys, e.g.{"key1":.a, "key2":.b}, or use the shorthand {key1,key2} which is the same as {"key1":.key1,"key2":.key2};
    • ( ) can be used to instruct jq which part of the filter to be processed first, but it does not remove/combine/split the data streams.
    • | has lower priority than ,. Therefore, 1,2,3|3 will produce three streams first 1,2,3, respectively, and then pass each stream to the second filter 3, with the end result of 3 streams of 3. If we use (), we can change that order. 1,2,(4|3) will process 4|3 first, generating 3, then processing 1,2,3, generating exact that as 3 streams of outoput. 1|4,2,3 will process 1 first, then passing to 4,2,3 three streams, causing the ending result to be 4,2,3 3 streams.
    • The filter is not a programming language. Usually stream data is not outputted in the middle of the filter processing. It usually is only outputted at the end of the filter processing, one stream at a time.
    • One can use variable assignment such as length as $array_length | . to have jq remember a interim value and used in later stages of the filter.
    • On the very basic level, data stream is passed to each filter unchanged. For example, .|.|.|.|. is the same as .
    • map(x) takes an array as input, and outputs a new array, after running that filter for each element of the input array.
    • The select() function can be used to remove undesired streams. [1,2,3] | map(select(. >= 2)) will give you [2,3]
    • One can use if-then-else-end in the filter. Note that the else clause must be there. Sometimes the empty function can help.
    • a//b The alternative operator can also be useful. It outputs a, if a is not false or null, otherwise it outputs b.
    • input_filename can be used to find the input file name.
    • One can use numbers/objects/arrays/etc. to select results from multiple streams
  • The final out is the multiple streams, one by one.

Tags: jq

IPv6 DNS server


September 25, 2019

SSH: How to identify which key to use when AgentFowarding is in use 

You can use the public part of a key to to specify which private key you want to use from the forwarded agent. This requires creating an extra file (the public part of the key) on any “intermediate” machines (machines to which you forward your local ssh-agent).
  1. Arrange for the intermediate machine to have a copy of the public part of the desired key in a convenient location (e.g. ~/.ssh/
    From any machine that already has the public part of the key:
    scp intermediate:.ssh/
    or, on the intermediate machine:
    ssh-add -L | grep something_unique > ~/.ssh/
    You may want to edit the trailing “comment” part of the public key to better identify the key’s origin/owner/purpose (or attempt to hide the same).
  2. Use the pathname to the above public key file with -i or IdentityFile.
  3. You may also need to use IdentitiesOnly yes (in .ssh/config or -o) to keep ssh from trying to offer any additional identities from your forwarded agent.

September 8, 2019

Lego EV3 Sound file .rsf format

Lego EV3 support play sound files, and the Lego programmer software comes with many built-in sound files. If you want to add new sound files,  you can use the "Sound Editor" that came with the software. Or you can convert an existing sound file to it.

The found file ends with .rsf extension (probably standing for Robotic Sound File). It has the following format:

[ 8 bytes of header]
[ raw sound data ]

The first 8 bytes of the file are meta data with the following meaning:

byte 0, byte 1: 0x01 0x00
byte 2, byte 3:  length, in big-endian, of raw sound data.
byte 4, byte 5: 0x1f, 0x40 (demical 8000, the sampling rate)
byte 6, byte 7: 0x00, 0x00.

raw sound data
The raw sound data is 8-bit of PCM data, with sampling rate of 8000 samples per second.

Example Script

On a Mac OS computer, one can use the following command to generate a audio file:
say "hello world" -o hello.aiff

Then you can use "ffmpeg" (you need to use brew to install it) to convert it to raw audio
ffmpeg -i hello.aiff -acodec pcm_u8 -f u8 -ar 8000 hello.raw

Then you can use the above mentioned tool "raw2rsf" to conver the raw file to .rsf file
raw2rsf hello.raw > hello.rsf

Then you can copy the hello.rsf file to your Lego Programmer sound file directory and then use it from the programmer software!

raw2rsf.c: a simple C program to convert a .raw file to .rsf file.

August 23, 2019

Instant file sharing without logging in

1. Web-RTC based; Local LAN transfer doesn't leave LAN

2. HTTP streaming and Web-RTC based.

3. HTTP streaming, supporting curl based command line.

Source code at:, and a minimal golang version at

Blog at

4. HTTP Streaming

WebRTC Demo using peer.js:

July 30, 2019

Windows 10 spotlight image files location


July 18, 2019

Search Linux Kernel to find out when a feature was added

Use the LKDDB (Linux Kernel Driver DataBase):

Search a CONFIG_xxx and it will tell you since what version of Linux kernel it was added.

May 15, 2019

TCP socket send buffer deep dive

A typical TCP socket send buffer is composed of three parts: unacked-bytes, unsent-bytes, and free-buffer.
                               |                |
                               |                |
                               |  FREE BUFFER   |
                               |                |
                               |                |
                               |                |
                               |  UNSENT BYTES  |
                               |                |
                               |                |
                               |  UNACKED BYTES |
                               |                |

Total send buffer size

Total send buffer size = unacked-bytes + unsent-bytes + free-buffer.  
It can be obtained using the SO_SNDBUF socket option. The buffer size could dynamically change its size as seen needed by the OS. This works for both Linux and macOS.
        slen = sizeof(sndbufsiz);
        err = getsockopt(sd, SOL_SOCKET, SO_SNDBUF, &sndbufsiz, &slen);

Total in-flight bytes

Total inflight bytes = unacked-bytes + unsent-bytes. 
It can be obtained using SO_NWRITE socket option on macOS, and SIOCOUTQ ioctl on Linux.
int get_socket_used(int sd){
    int err;
    int used;
#ifdef __APPLE__
    socklen_t slen = sizeof(used);
    err = getsockopt(sd, SOL_SOCKET, SO_NWRITE, &used,&slen);
    if(err < 0) {
        perror("getsockopt 2");
    err = ioctl(sd, SIOCOUTQ, &used);
    if(err < 0) {
        perror("ioctl SIOCOUTQ");
    return used;
On macOS, it seems that this can also be obtained using the TCP_INFO struct, but it is a private API.
u_int32_t       tcpi_snd_sbbytes;       /* bytes in snd buffer including data inflight */


On Linux, unacked-bytes can be obtained from the TCP_INFO structure, but the result is number of segments, or bytes. On macOS, TCP_INFO seems to contain this infomration (private API).
int get_socket_unacked(int sd){
    struct tcp_info tcp_info;
    socklen_t tcp_info_length = sizeof(tcp_info);
    if ( getsockopt(sd, IPPROTO_TCP, TCP_INFO, (void *)&tcp_info, &tcp_info_length ) == 0 ) {
        return tcp_info.tcpi_unacked;
    return 0;

//For macOS, use TCP_INFO
    u_int64_t       tcpi_txunacked __attribute__((aligned(8)));    /* current number of bytes not acknowledged */
macOS tcp_info definition

unsent-bytes (not including un-acked bytes)

On Linux, unsent-bytes can be obtained from the tcpi_notsent_bytes field of the TCP_INFO structure. NOTE that this requires kernel version to be 4.6 or newer. For Android that means Android 8 or newer. On Linux, it can also be obtained using SIOCOUTQNSD ioctl. It’s not clear how to do this on macOS.
//defined in /usr/include/linux/sockios.h 
int get_socket_unsent(int sd){
    int err;
    int unsent;
    err = ioctl(sd, SIOCOUTQNSD, &unsent);
    if(err < 0) {
        perror("ioctl SIOCOUTQNSD");
    return unsent;


int get_socket_unsent(int sd){
    struct tcp_info tcp_info;
    socklen_t tcp_info_length = sizeof(tcp_info);
    if ( getsockopt(sd, IPPROTO_TCP, TCP_INFO, (void *)&tcp_info, &tcp_info_length ) == 0 ) {
        return tcp_info.tcpi_notsent_bytes;
    return 0;

Stackoverflow discussion on getting unsent bytes

epoll and kevent

epoll on Linux, and kevent on macOS, get triggered by the unsent-bytes, when the TCP option TCP_NOTSENT_LOWAT is set. On macOS, kevent() doesn’t report socket as writable until the unsent TCP data drops below specified threshold (typically 8 kilobytes).

May 9, 2019

Enable user-id based packet routing on Mac OS

If you would like to route all socket (TCP/UDP) traffic from processes running by a particular user on a Mac OS to be routed differently, you can do that.

1. Add the user to your Mac OS if not already done. In this example, I will add an user named "test1"
2. run the command:
        sudo vi /private/etc/pf.conf
    and add the following line before ' anchor "*"
         pass out quick on en0 route-to { utun4 } user test1

   a) change en0 to your default network interface name on Mac
   b) change utun4 to the network interface you would these packets to be routed to

3. restart pf by doing:
    sudo pfctl -d; sudo pfctl -e -f /etc/pf.conf

Now all processes running by user test1 should be routed to the new interface as specified.

February 26, 2019

keep ssh running in background for tunneling

1. write the following script to a file, named "" and make it executable (make sure user has public auth enabled on remote host):

while true; do ssh -t -n -R "while true; do ps -ef; sleep 1; done" ; sleep 1; done

2.  Run the the above script in a detached screen session:
    screen -S tunnel -d -m /path/to/

 That's all. This creates a background screen session, which runs the script, which loops an ssh command to keep it up and running.