Rahul Raj: 2021

Sep 22, 2021

kubernetes high level overview

Kubernetes is an open-source system for automating deployment, scaling, healing, networking, persistent storage options, and management of containerized applications. It provides features like Service Discovery, Load Balancing, Storage Orchestration, Automates Rollout/Rollback, Self-healing, Secret and Configuration Management, Horizontal Scaling, Zero downtime deployment, blue-green, canary, Run like the production on the development machine.

It consists of one or more master nodes that will manage one or more worker nodes that together work as a cluster. The master node will start something call pod on the worker node, which is a way to host container. You will need a Deployment/replica set to deploy the pod. Kubernetes does not deploy the container on a worker node, but it encapsulates it within a kubernetes object called pods. One pod will contain only one same type application container, but you can have different type application containers running in the same pod, like the front end and middle-tier container running in the same pod. In this case, the two containers can talk using localhost since they share the network space and also they share storage space.

Master Node Components (collectively known as the control plane)

API Server - This acts as a front-end to the Kubernetes, users, management devices, command-line interface which talks to the API Server to interact with Kubernetes.

ETCD store - It's a key-value pair database that stores information about the cluster

Scheduler - Responsible for distributing work or containers across nodes. It looks for a newly created container and assigns it to the node. node/pod running comes to life or goes away.

Controller - They are responsible for processing and responding when Node/container goes down, it makes a decision to bring up a new container. They are the brain behind the orchestration.

Worker node components (collectively known as the data plane)

Container Runtime - Underlying software which is used to run container.. like docker, rocket, etc

Kubelet - It's an agent which runs on each node in the cluster, agent is responsible for making sure that the container is running as expected. It runs on the node which registers that node with the cluster to report back and forth with the master node.

Kubectl Explain

Kubectl Explain is a handy command to List the fields for supported resources and also get detailed documentation around that. Its a good way to explore any kind of k8s object. Refer following for few example. Also refer kubectl-commands and cheatsheet

kubectl explain deployment.spec.strategy.rollingUpdate

kubectl explain deployment.spec.minReadySeconds

kubectl explain pod.spec.tolerations

kubectl explain pod.spec.serviceAccountName

kubectl explain pod.spec.securityContext

kubectl explain pod.spec.nodeSelector

kubectl explain pod.spec.nodeName

kubectl explain pod.spec.affinity

kubectl explain pod.spec.activeDeadlineSeconds

kubectl explain job.spec.

kubectl explain job.spec.ttlSecondsAfterFinished

kubectl explain job.spec.parallelism

kubectl explain job.spec.completions

kubectl explain job.spec.backoffLimit

kubectl explain job.spec.activeDeadlineSeconds

If you want to get more familiar with the imperative way of creating k8s resources refer CKAD Exercises as they can be useful during the CKAD exam. The imperative way of creating resources sometimes saves time which is critical in the exam. You can always patch or edit the resource if that's allowed.

Apr 12, 2021

Snowflake Introduction

Snowflake is a Fully Managed cloud data platform. This means that it provides everything you need to build your data solution, such as a full-feature data warehouse. It is cloud-agnostic and most importantly you can even replicate between the clouds.

The architecture comprises a hybrid of traditional shared-disk and shared-nothing architectures to offer the best of both.

The storage layer organizes the data into multiple micro partitions that are internally optimized and compressed. Data is stored in the cloud storage (storage is elastic) and works as a shared-disk model thereby providing simplicity in data management. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake. As the storage layer is independent, we only pay for the average monthly storage used.

Compute nodes (Virtual Warehouse) connect with the storage layer to fetch the data for query processing. These are Massively Parallel Processing (MPP) compute clusters consisting of multiple nodes with CPU and Memory provisioned on the cloud by Snowflake. These can be started, stopped, or scaled at any time and can be set to auto-suspend or auto-resume for cost-saving.

Cloud Services Layer handles activities like authentication, security, metadata management of the loaded data, and query optimization

Data is automatically divided into micro-partitions and each micro-partition contains between 50 MB and 500 MB of uncompressed data. These are not required to be defined upfront. Snowflake stores metadata about all rows stored in a micro-partition.Columns are stored independently within micro-partitions, often referred to as columnar storage. Refer this for more details.

In addition to this, you can manually sort rows on key table columns, however, performing these tasks could be cumbersome and expensive. This is mostly useful for very large table