October 30, 2019

AWS VPC: Private Subnet vs Public Subnet

In AWS, a private subnet is defined as a subnet that doesn’t have a direct route to the Internet. It is only accessible from within the subnet, e.g. web server accessing internal database servers. If instances inside the private subnet needs to go out to the Internet, for example, updating packages, it will need to hop through a machine inside the public subnet. This can be done by a NAT instance (a dedicated instance just for this purpose) or by a NAT gateway ( a managed service provided by AWS).
Public Subnet Private Subnet
Instances have public IP addresses Instances do not have public IP address (selectable during instance creation)
Route table attached to the subnet has a default route Route table attached to the subnet has no default route

Internet Gateway

AWS Internet Gateway is a one-to-one private IP to public IP NAT device/service, not a regular NAT device like a home router. Instances inside the VPC subnet needs to have a public IP address associated with it. If not, Internet Gateway would not be able to route traffic for it.

Egress-only Internet Gateway

IPv6 Only

NAT gateway or NAT Instance

IPv4 Only

Internet Access for Default and Nondefault VPCs

The following table provides an overview of whether your VPC automatically comes with the components required for internet access over IPv4 or IPv6.

Component Default VPC Nondefault VPC
Internet gateway Yes Yes, if you created the VPC using the first or second option in the VPC wizard. Otherwise, you must manually create and attach the internet gateway.
Route table with route to internet gateway for IPv4 traffic (0.0.0.0/0) Yes Yes, if you created the VPC using the first or second option in the VPC wizard. Otherwise, you must manually create the route table and add the route.
Route table with route to internet gateway for IPv6 traffic (::/0) No Yes, if you created the VPC using the first or second option in the VPC wizard, and if you specified the option to associate an IPv6 CIDR block with the VPC. Otherwise, you must manually create the route table and add the route.
Public IPv4 address automatically assigned to instance launched into subnet Yes (default subnet) No (nondefault subnet)
IPv6 address automatically assigned to instance launched into subnet No (default subnet) No (nondefault subnet)



October 17, 2019

R ggplot2 geom_bar order

In R, to plot a pie chart using a dataframe as the data source, one would use geom_bar first to plot a bar graph, and then use coord_polar to convert it to a pie chart. It works, but there is a catch:
  • geom_bar sorts the fill value alphabetically and always put the lowest value one on the top of the bar graph. For example, “apple” would be on the top of “pear”. This becomes a problem if you intend to label the pie chart.
  • The solution is to sort the dataframe in decreasing order by the fill variable. This way all calculations will be right.

Example R code with correct labels.

df=data.frame(val=c("z","b","c"),c2=c("b","d","f"),Freq=c(5,3,4))

df=df[order(df$val,decreasing=T),]
df$Label <- paste(df$val, paste(round((df$Freq/sum(df$Freq))*100,0),"%",sep=""), sep="-")
p <- ggplot(df,aes(x=1,y=Freq,fill=val))+geom_bar(stat="identity", color = "black")
p1 <- p + coord_polar(theta='y') + theme(axis.ticks=element_blank(),
                                axis.text.y=element_blank(),
                                axis.text.x=element_text(colour='black'),
                                axis.title=element_blank())
p2 <- p1 + scale_y_continuous(labels= df$Label,breaks=cumsum(df$Freq) - df$Freq/ 2)
p3 <- p2 + labs(title=title,fill=legend_title)+theme(plot.title = element_text(hjust = 0.5))
print(p3)

Update on 10/23/2019

This article explains the order issue very clearly:

https://sebastiansauer.github.io/ordering-bars/

In short, the rule for the sort order is:
  • if factor, the order of factor levels is used
  • if character, an alphabetical order is used
So one can change the factor order to change the plot order.

October 15, 2019

jq

jq is a very useful tool to process JSON.

The idea of jq is centered on the idea of pipe and data stream.

Note below that a JSON entity is a self-contained and complete JSON object/array.

Pipes

  • When jq is invoked, the input JSON file is parsed as one or multiple JSON entities.
    • The input file can have multiple JSON entities, separated by new lines or spaces. One stream is created for each JSON entity.
    • If multiple files are supplied to jq, each file should contain one or more self-contained JSON entities.
    • To combine all the multiple input JSON entities into one big JSON entity, pass the “–slurp” flag to jq. It will automatically create a containing JSON array and contain all input JSON entities into it. This includes the case of multiple input JSON entities in one input file, or multiple input files.
  • In short, jq is started with one or multiple data streams which are from the input JSON entities.
  • Each stream is then passed through the specified filter independently.
  • One can pass the -c (compact) flag to jq to print each stream as one line.
  • Some key concepts of the filter.
    • . means the entire input data stream
    • .[] moves into the JSON object/array one layer, and splits the data stream into multiple streams, one for each entry in the object or array. For example. {"a":1,"b":2} will broken into two data streams after passing this filter, becoming 1 and 2 respectively. Note that they are the values of the entires and are not valid JSON anymore, which is okay but needs to be noted. The same streaming-splitting will also happen for JSON arrays after passing this filter.
    • .["key"] generates one stream from the input stream. The shorthand is .key;
    • , can be used to manually generate multiple streams.
    • To collect multiple streams into one stream, one can use [ ] to generate an array, or use { } to generate an object. Note that to generate an object, one has to manually specify the keys, e.g.{"key1":.a, "key2":.b}, or use the shorthand {key1,key2} which is the same as {"key1":.key1,"key2":.key2};
    • ( ) can be used to instruct jq which part of the filter to be processed first, but it does not remove/combine/split the data streams.
    • | has lower priority than ,. Therefore, 1,2,3|3 will produce three streams first 1,2,3, respectively, and then pass each stream to the second filter 3, with the end result of 3 streams of 3. If we use (), we can change that order. 1,2,(4|3) will process 4|3 first, generating 3, then processing 1,2,3, generating exact that as 3 streams of outoput. 1|4,2,3 will process 1 first, then passing to 4,2,3 three streams, causing the ending result to be 4,2,3 3 streams.
    • The filter is not a programming language. Usually stream data is not outputted in the middle of the filter processing. It usually is only outputted at the end of the filter processing, one stream at a time.
    • One can use variable assignment such as length as $array_length | . to have jq remember a interim value and used in later stages of the filter.
    • On the very basic level, data stream is passed to each filter unchanged. For example, .|.|.|.|. is the same as .
    • map(x) takes an array as input, and outputs a new array, after running that filter for each element of the input array.
    • The select() function can be used to remove undesired streams. [1,2,3] | map(select(. >= 2)) will give you [2,3]
    • One can use if-then-else-end in the filter. Note that the else clause must be there. Sometimes the empty function can help.
    • a//b The alternative operator can also be useful. It outputs a, if a is not false or null, otherwise it outputs b.
    • input_filename can be used to find the input file name.
    • One can use numbers/objects/arrays/etc. to select results from multiple streams
  • The final out is the multiple streams, one by one.

Tags: jq

IPv6 DNS server

2600::
2600::1
2600::2