Rahul Raj: AWS

Showing posts with label AWS. Show all posts

Jan 27, 2023

AWS Lambda Cold Start

The lambda function runs in an ephemeral environment. It spins up on demand, lasts a brief time, and then is taken down. Lambda Service manages to create and tear down these environments for your function. You don't have control over these environments.

Invocation Requests

=> AWS Lambda Service

=> 1. Create an Execution Environment to run the function

2. Download code into the environment and initializes runtime

3. Download packages, dependencies

4. Initialize the global variable

5. Initialized temp space

=> Lambda runs your function starting from the Handler method

When an invocation is complete, Lambda can reuse that initialized environment to fulfill the next request, if the next request comes in close behind the first one, in which case the second request skips all of the initialization steps and goes directly to running the handler method.

If you pay attention to the cloud watch log, you can notice big differences in the duration between cold start and warm start lambda. Refer following which is for a simple function that returns a list of strings written in dotnet6.

First Invocation

INIT_START Runtime Version: dotnet:6.v13 Runtime Version

Duration: 24293.23 ms Billed Duration: 24294 ms Memory Size: 500 MB Max Memory Used: 90 MB

Second Invocation

Duration: 9.91 ms Billed Duration: 10 ms Memory Size: 500 MB Max Memory Used: 90 MB

Here are some of the guidelines which you can take as a developer to mitigate cold start

Provisioned concurrency - Setting this will keep the desired number of environments always warm. Request beyond the provisioned concurrency (spillover invocation) uses the on-demand pool, which has to go through the cold start steps outlined above. This has a cost implication. You may want to analyze the calling pattern and update provisioned concurrency accordingly to minimize the cost.
Deployment package - Minimize your deployment package size to its runtime necessities. This reduces the amount of time it takes for your deployment package to download and unpack ahead of invocation. This is particularly important for functions authored in compiled languages. Framework/languages which support AOT compilation and tree shaking can have an automated way to reduce the deployment package.
AOT - In .Net a language-specific compiler converts the source code to the intermediate language. This intermediate language is then converted into machine code by the JIT compiler. This machine code is specific to the computer environment that the JIT compiler runs on. The JIT compiler requires less memory usage as only the methods that are required at run-time are compiled into machine code by the JIT Compiler. Code optimization based on statistical analysis can be performed by the JIT compiler while the code is running. But on the other hand, JIT compiler requires more startup time while the application is executed initially. To minimize this you can take advantage of AOT support with .Net7. Using this you publish self-contained app AOT compiled for a specific environments such as Linux x64 or Windows x64. This can help reduce the cold start.
SnapStart-When you publish a function version, Lambda takes a snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access.

Dec 6, 2022

serverless vs container

Serverless is a development approach where your cloud service provider manages the servers. This replaces the long-running virtual machine with computing power which comes into existence on demand and disappears immediately after use. The most common AWS serverless services are: Compute - Lambda, Fargate; Data Storage - S3, EFS, DynamoDB; Integration - API Gateway, SQS, SNS, EventBridge, and Step Functions.

Pros of serverless

Pay for the time when the server is executing the action.
Allows the app to be elastic, it can automatically scale up/down.
Much less time managing the servers like provisioning infrastructure, managing capacity, patching, monitoring, etc. For example in the case of lambda, you configure memory and storage based on which you pay.
Help reduces development time as you focus less on infrastructure and more on business logic. In my opinion, this is one of the key benefits.
Fits well with microservices to build loosely-coupled components.
You just take care of the code in terms of testing, security, and vulnerability.

Cons of serverless

One of the biggest channels is a cold start. A few technics which can help are SnapStart (currently only supported in java runtime), AOT with .net core 7, and Provisioned concurrency (extra cost).
Vendor lock-in, the same code will not run in Azure or GPC. Also, you will have to find a way to run it on the local developer machine. Visual Studio template (like serverless.AspNetCoreWebAPI) helps you create a project with both local entry points (for debugging on the dev machine) and a lambda entry point. This also adds a code separation which can be helpful in case you need to use a different cloud provider.
The maximum timeout is 15 minutes, so if you have a long-running process, this may be a challenge. Leveraging the step function may be an option to break long-running tasks.
You do not have much control over the server which for most cases should be fine, but in the special case where you would want GPU for processing large video or some machine learning processing, this may not be the right choice.
For complex apps, you may have to perform a lot of coordination and manage dependencies between all of the serverless functions.
Can't scale over the hard limit. You may have to use a different account/region.

Pros of EKS

Portability - you can run it anywhere, so cloud agnostic. You can run the same code on the developer's machine using minikube or Kubernetes in Docker Desktop. Easy to replicate the environment.
You have better control over the underlying infrastructure. You define the entire container ecosystem and the server they run. You can manage all the resources, set all of the policies, oversee security, and determine how your application is deployed and behaves. You can debug and monitor in a standard way.
No timeout restriction.
You have greater control to optimize according to instance type/size by defining affinity/tolerance. You can make use of spot instances to control cost. So by optimizing the resources, you can achieve the biggest saving, but definitely, it will be at the cost of DevOps work.

Cons of EKS

Lot more time to build and manage infrastructure work.
You will need to make sure you are keeping up to date with the container base image and any package you are using. If you don't keep up to date with the short-release cycle it can become difficult to maintain the application.
You need to manage to scale up and scale down. You can use technologies like Auto scaling (Horizontal pod autoscaling)/Karpenter(node-based autoscaling) can help.
Containers are created/destroyed so this adds complexity in terms of monitoring, data storage, etc compared to running applications in VM. You need to account for this during the application design.

As with everything, it all depends on the use case. Here are some of the guidelines which I use. For any new project, my first preference is serverless for apparent reasons. If it's for a long-running application, and I am not willing to re-architect, my preference is a container. If cost is the factor for your decision, you need to consider extra DevOps time needed to devolve/maintain the k8s solution, and for serverless are you designing an application for continuous use (like a web service) or one-off use case (only a few times in a day). If you have a use case for multiple cloud providers, you need to give thought as eks has better portability, but on the other hand, other cloud providers are providing serverless support and if you maintain separation of concern this may not be a challenge.

May 6, 2020

SNS

SNS is the notification service provided by AWS, which manages the delivery of a message to any number of subscribers. It uses the publisher/subscriber model for push delivery of messages. Subscribers are notified using following supported protocols - SQS, lambda, http/s, email, SMS.

To use SNS, you create a topic and define a policy that dictates who can publish and subscribe to it. You can define policy by configuring the condition to give cross-account access. An SNS request has a Topic where you want to publish to, Subject, Message, MessageAttributes, MessageStructure.

The subscriber can define subscription filter policy and dead-letter queues. By configuring the subscription filter policy, you can filter the message to be sent to the subscriber based on rules defined on the message attribute. You can assign a redrive policy to Amazon SNS subscriptions by specifying the Amazon SQS queue that captures messages that can't be delivered to subscribers successfully. You can test this by deleting the endpoint like lambda function.

When you configure dead letter Q, you need to make sure, that SNS has the necessary permission to publish the message to the Q, by adding permission policy in SQS with SNS earn. Once a message is in dead letter Q, you can either have lambda configured to process them and also use cloud watch metrics to monitor dead letter Q.

SNS -> Lambda vs SNS -> SQS -> Lambda

If you have SQS in between SNS and Lambda, it can give you the flexibility of reprocessing. You may be able to set Redrive policy for SQS and set Maximum Receives, which essentially means the message will be received by lambda that many numbers of times before being sent to the dead letter Q. If no Redrive policy is set, then after every visibility timeout, the message will be sent to the lambda, until the message retention period. In case when SNS directly sends the message to lambda, its only one time, and if it fails it will get sent to the dead letter Q if Redrive policy is set. With SQS you can have a retention period of at least up to 14 days.

SQS retry happens after the visibility timeout occurs and the visibility timeout should be more than lambda timeout which essentially makes sure the message is sent to lambda only after lambda processing is completely done, which prevents duplicate processing of the message.

In Sqs (Pull Mechanism) messages are persisted for some (configurable) duration if no consumer is available, whereas in Sns (Push Mechanism) messages are sent to the subscribers which are there at the time when the message arrives.

Mar 11, 2020

Running .NET Core 3.1 on AWS Lambda

AWS Lambda supports multiple languages through the use of runtimes. To use languages that are not natively supported, you can implement custom runtime, which is a program that invokes the lambda function's handler method. The runtime should be included in the deployment package in the form of an executable file named bootstrap. Here is the list of things which you need to do in order to run a .NET Core 3.1 on AWS lambda.

bootstrap

Since this is not a supported runtime, you need to include a bootstrap file which is a shell script that Lambda host calls to start the custom runtime.
#!/bin/sh
/var/task/YourApplicationName

Changes to project file

You need a couple of NuGet packages from Amazon.Lambda: AspNetCoreServer and RuntimeSupport. AspNetCoreServer provides the functionality to convert API Gateway’s request and responses to ASP.NET Core’s request and responses and RuntimeSupport provides support for using custom .NET Core Lambda runtimes in Lambda

Apart from that, you need to make sure to include bootstrap in the package and change the project output type to exe.

<OutputType>Exe</OutputType>

<ItemGroup>
<Content Include="bootstrap">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</Content>
</ItemGroup>

Add Lambda entry point

This class extends from APIGatewayProxyFunction which contains the method FunctionHandlerAsync which is the actual Lambda function entry point. In this class override the init method where you need to configure startup class using the UseStartup<>() method. If you have any special requirements, you can use FunctionHandlerAsync, where you can write your own handler. One example will be lambda warmer, where you don't want the actual code to be executed, rather you would want to respond directly from this method. The following code snippet is just for reference purpose, with provisioned concurrency supported in AWS lambda, you can achieve the same

public override async Task<APIGatewayProxyResponse> FunctionHandlerAsync(APIGatewayProxyRequest request, ILambdaContext lambdaContext)

{

if (request.Resource == "WarmingLambda")

{

if (string.IsNullOrEmpty(containerId)) containerId = lambdaContext.AwsRequestId;

Console.WriteLine($"containerId - {containerId}");

var concurrencyCount = 1;

int.TryParse(request.Body, out concurrencyCount);

Console.WriteLine($"Warming instance { concurrencyCount}.");

if (concurrencyCount > 1)

{

var client = new AmazonLambdaClient();

await client.InvokeAsync(new Amazon.Lambda.Model.InvokeRequest

{

FunctionName = lambdaContext.FunctionName,

InvocationType = InvocationType.RequestResponse,

Payload = JsonConvert.SerializeObject(new APIGatewayProxyRequest

{

Resource = request.Resource,

Body = (concurrencyCount - 1).ToString()

})

});

}

return new APIGatewayProxyResponse { };

}

return await base.FunctionHandlerAsync(request, lambdaContext);

}

Update Main function

In NET Core 2.1 which is native Lambda runtime, the LambdaEntryPoint is loaded by Lambda through reflection(through the handler configuration) but with custom runtime, this needs to be loaded by the main function. To make sure the ASP.NET Core project works locally using Kestrel, you can check if AWS_LAMBDA_FUNCTION_NAME environment variable exists.

if (string.IsNullOrEmpty(Environment.GetEnvironmentVariable("AWS_LAMBDA_FUNCTION_NAME")))
{
CreateHostBuilder(args).Build().Run();
}
else
{
var lambdaEntry = new LambdaEntryPoint();
var functionHandler = (Func<APIGatewayProxyRequest, ILambdaContext, Task<APIGatewayProxyResponse>>)(lambdaEntry.FunctionHandlerAsync);
using (var handlerWrapper = HandlerWrapper.GetHandlerWrapper(functionHandler, new JsonSerializer()))
using (var bootstrap = new LambdaBootstrap(handlerWrapper))
{
bootstrap.RunAsync().Wait();
}
}

Add defaults file

.NET Lambda command-line tools and VS deployment wizard use a file called aws-lambda-tools-defaults.json for settings to use for packaging Lambda project into a zip file ready for deployment and for deployment. Deployment under the hood uses cloud formation. Run following to explore more about tool
dotnet lambda help

Cli Command

dotnet lambda package --output-package lambda-build/deploy-package.zip

dotnet lambda help

Nov 23, 2019

AWS Lambda

AWS Lambda is a service that provides you the option to run function written in .net core, node, python, java, etc to run on aws. AWS Lambda executes code only when needed and scales automatically. You pay only for the compute time you consume and no code when your code is not running, which is different than code running in Ec2 instance or ECS container.

When a function is invoked, AWS lambda creates an instance and runs its handler method. While the first event is being processed, if the function is invoked again, and no instance is available, AWS lambda will create another instance. After processing the event the instance sticks around to process additional events. When a new instance is created, the response time increases which is called cold start. As more events come in, Lambda routes them to available instances and creates new instances as needed. Your function's concurrency is the number of instances serving requests at a given time. When the number of requests decreases, Lambda stops unused instances to free upscaling capacity for other functions. There is a limit to concurrency limit based on the region. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error.

To ensure that a function can always reach a certain level of concurrency, as a maximum allowed concurrency is shared across all functions in an account, you can configure the function with reserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency also limits the maximum concurrency for the function.

To enable your function to scale without fluctuations in latency, you can use provisioned concurrency. It also integrates with auto-scaling, so you can update provisioned concurrency based on the demand. The overhead in starting lambda (cold start) consists of two parts - time to set up an execution environment that is entirely controlled by AWS, and code initialization which involves things like initializing objects and frameworks. Provisioned Concurrency targets both causes of cold-start latency

You configure the amount of memory which will be available during execution. Similarly, you configure timeout which is the maximum time a function can run. All AWS Lambda functions run securely inside a default system-managed Virtual Private Cloud (VPC) however you can configure it to run within custom VPC, subnet (use at least two subnets for high availability), and Security Group. When you enable a VPC, your Lambda function loses default internet access. If you require external internet access for your function, make sure that your security group allows outbound connections and that your VPC has a NAT gateway.

Lambda function and trigger (like API gateway) are the core component of AWS lambda. You specify the execution role is the IAM role that AWS Lambda assumes when it executes your function.

Handy Cli Command
aws lambda help
aws lambda update-function-code --profile <profile-name-if-not-default> --function-name <function-name> --zip-file fileb://lambda-build/deploy-package.zip

Nov 10, 2019

AWS ECS

ECS is AWS’s proprietary, managed container orchestration service that supports Docker, much like google open-sourced container orchestration platform Kubernetes. Amazon also supports Kubernetes with their EKS offering more on that later. Running applications in containers rather than traditional VM’s brings great value due to the fact that they are easily scalable and ephemeral. However operating at scale, a container orchestration platform that automates the provisioning and deployment, redundancy and availability, scaling up\down based on a given load, resource allocation, health monitoring of containers and hosts, seamless deployment of new application versions, has become necessary. There are two ways to launch containers for both ECS and EKS and a factor to choose between them depends on how much control you want to have at the cluster host.

EC2 - You are responsible for deploying and managing your own cluster of EC2 instances for running the containers. Here the billing is based on the cost of the underlying EC2 instances and it's your responsibility to make sure your containers are densely packed and also your instance has all updates. This is more suitable for a large workload which requires more CPU and memory (or any special requirement) where you can optimize pricing by taking advantage of using spot instance or reserved instance.

AWS Fargate - You run your containers directly, without any EC2 instances. Here the billing is based CPU cores, and memory your task requires, per second. This is more suitable for small workloads with occasional bursts and also for the case where you do not want to manage the overhead of underlaying host.

ECS Cluster

Its a logical grouping of task or service. There are two ways to create a cluster: Network only and Ec2 with Networking. In the case of Ec2 launch type its a group of the container instances. You can assume this as a group of container instances (in case of Ec2 launch type you manage and create those instances, in case of Fargate, Amazon does that for you) acting as a single resource. You can mix and match instance types within a cluster but instances cannot join multiple clusters. A task is scheduled onto clusters. Handy aws cli command around ecs cluster

aws ecs create-cluster --cluster-name mycluster

aws ecs list-clusters

aws ecs describe-clusters --cluster mycluster

aws ecs delete-cluster --cluster mycluster

Container Agent

It allows container instances to join a cluster and it runs on Ec2

Container Instances

Ec2 instance registers to a cluster and connects via a container agent. The state can be active and connected, active and disconnect (in case if the instance is stopped) and finally inactive (terminated instance)

Task Definition

This describes how the docker image should run. This can include one or more containers. Grouped containers run in the same instance. This will help the interaction between containers in terms of latency. There are three main components of task definition

Family

Following are few of the important configuration which belongs to this

Task Execution Role - This role is required by tasks to pull container images and publish container logs to Amazon CloudWatch on your behalf.
Task Role - IAM role that tasks can use to make API requests to authorized AWS services. For example, you can have a role configured to access S3 bucket and the task running can access s3 bucket.
Network Mode - awsvpc, bridge, host, none

Container Definition

This includes the following

image
port mapping - You cannot run two container instances on the same ec2 instance which uses the same port. Container ports and protocols combination must be unique within a Task definition
environment variables and entry point,
CPU - CPU unit reserved for the container, maps to CpuShares in docker run
Memory - Amount of memory for a container, the sum of all container memory within a task should be less than task memory. The container will die if it exceeds the memory limit. This maps to memory docker run
Memory reservation - When system memory is under heavy contention, Docker attempts to keep the container memory to this soft limit. Hard limit is the maximum it can use after which container will die. This maps to MemoryReservation in docker run.
Logging configuration - You can configure your container instances to send log information to cloud watch or splunk etc

Volumes

This is used for persisting data generated by and used by Docker containers.

You can define multiple containers in a task definition, though you should be careful in creating task definitions that group the containers that are used for a common purpose, and separate the different components into multiple task definition

Task Networking in AWS Fargate Task Definition

awsvpc network mode gives Amazon ECS tasks ENI, private IP and also provides greater security for your containers by allowing you to use security groups and network monitoring tools ( like VPC Flow Logs ) at a more granular level within your tasks. Containers belonging to the same task can communicate over the localhost interface. When running a task in Fargate, there are two different forms of networking to consider:

Container (local) networking - Container networking is often used for tightly coupled application components, so it bypasses the network interface hardware and instead of the operating system just routes network calls from one process to the other directly and hence results in faster communication. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication. If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port like 127.0.0.1:8080. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.

External networking - External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.

Scheduler

Scheduler helps you utilize your cluster resources. You don't have to figure out which Ec2 instance in the cluster will be running the task unless you have a specific need. There are three ways to schedule something on your cluster

Services - These are long-lived and stateless. You define how many task instance will be running. It plays nicely with LB which load balances traffic to multiple task instances. Three steps of running task into a cluster

Based on task definition it will figure out which container instances are available to run the task
Figure out which AZ has the least amount of service task running
Figure out which instance has the least amount of task running

Task - These are sort lived/1 off the task that exit when done. You can use run task command which distributes tasks on your cluster and minimizes specific instances from getting overloaded.
Starting Task - StartTask will let you pick where you want to run the task. It will let you build or use your own scheduler.

Both service and task have three states - pending, running, stopped. The container agent is responsible for state tracking.

AWS ECS Autoscaling

AWS Auto Scaling automates capacity management for various AWS services. For ECS you specify the minimum/maximum number of the task which sets the boundary of the autoscaling policy. You also need to specify the IAM role that authorizes ECS to use autoscaling service. You can create an IAM role for service application-autoscaling.amazonaws.com and attach policy AmazonEC2ContainerServiceAutoscaleRole to it.

Autoscaling Policy

You can define target tracking or step-scaling policy. For target tracking, you can specify a target value for Average CPU Utilization, Average Memory Utilization, or Request Count, based on which, your service will scale out or scale in. For step-scaling, you can use cloud watch alarm, with a threshold, based on which you can add or remove the task. Cloud watch alarm for ECS fragate, supports CPU Utilization and Memory Utilization ECS service matrix. It also supports the cooldown period, which is the duration for which the next scaling operation is hold off.

Handly aws cli command around ecs service/tasks

aws ecs create-service --generate-cli-skeleton

aws ecs create-service --cluster mycluster --service-name web --task-definition web --desired-count 1

aws ecs list-services --cluster mycluster

aws ecs describe-services --cluster mycluster --services web

aws ecs update-service --cluster mycluster --service-name web --task-definition web --desired-count 2

aws ecs delete-service --cluster mycluster --service-name web

aws ecs register-task-definition --generate-cli-skeleton

aws ecs run-task --cluster mycluster --task-definition web --count 1

aws ecs list-tasks --cluster mycluster

aws ecs stop-task --cluster mycluster --task arn

aws ecs list-container-instances --cluster mycluster

aws ecs start-task --cluster mycluster --task-definition web --container-instances arn

Aug 29, 2019

Terraform

Terraform is an open-source infrastructure as a code (hashicorp configuration language) software tool created by HashiCorp. Similar to CloudFormation, terraform is designed as a provisioning tool, which is different than tools like chef and puppet, which are primarily designed for configuration management. Definitely, there is some overlap between configuration management tools and provisioning tools, but it's important to understand what they are best at. Terraform and CloudFormation follows declarative state language, which means you specify end state and tool will figure out sequence or dependencies of the task. Tools like ansible are procedural-based automation tools, where you write code and specify the sequence of steps to achieve end state.

The main purpose of the Terraform language is declaring resources. All other feature exists to make the definition of resources more flexible and convenient. A Terraform configuration consists of a root module, where evaluation begins, along with a tree of child modules created when one module calls another.

Terraform Components

variable

Input variables serve as parameters for a Terraform module. When you declare variables in the root module of your configuration, you can set their values using CLI options (-var="my_var=something"), or in .tfvars file or in environment variables. When you declare them in child modules, the calling module should pass values in the module block. The value assigned to a variable can be accessed (var.my_var) only from expressions within the module where it was declared. Ex string, number, bool, list, map, set

provider

Provider (like aws or google) requires configuration of their own, like authentication settings, etc. You can have multiple providers give them an alias and reference it by alias name (<PROVIDER NAME>.<ALIAS>). Every time a new provider is added, you need to download it by initializing it.

resource

Resource block describes one or more infrastructure objects. Each resource is associated with a single resource type. Terraform handles most resource dependencies automatically but in the rare cases where you need to explicitly define you can use depends_on metadata argument. You can use a count meta argument to create multiple resources. You can add Provisioners to any resource and are used to execute scripts on a local or remote machine

output

This is like the return of a module that can be used in the child module or for printing from the root module.

module

This is a container for multiple resources while it helps reusability of the code. Every Terraform configuration has at least a root module, which consists of the resources defined in the .tf files in the main working directory.

data

Data sources allow fetching data defined outside of Terraform. One of the very common use case of this is getting aws AZ

data "aws_availability_zones" "available" {
state = "available"
}

In the above example, we are storing aws AZs list which can be accessed anywhere in the code as follow

"${data.aws_availability_zones.available.names[0]}"

State

Terraform state can be considered some sort of database to map Terrform config to the real world. This also tracks resource dependencies. This can be saved locally or remotely and location can be configured by -state. It locks state while applying for all operations that could write state.

Terraform Console

This provides an interactive way to evaluate an expression. This becomes very handy to test with build-in function and also with interpolation. Just type terraform console on the command prompt and then play with it.

Dec 19, 2017

S3

S3 provides secure, durable, highly scalable object based storage. The data is stored across multiple devices and facilities.

Files can be anywhere from 0byte to 5TB.
Files are stored in bucket
You can access bucket with following url https://s3.amazonaws.com/<bucketname> , so the name of bucket should be universal
When you upload file to S3 bucket, you get 200 status code
Read after write consistency
Eventually consistency for overwrite for PUTS and DELETS. This is because object stored across multiple devices and facilities may take time to propagate. Though it may take few sec or millisec to propagate, but at any point data will be atomic meaning you will either get old data or new data.
S3 is object based storage, which means its suitable for objects like pdf, images etc. It is not for installing OS or DB. Each object consists of

Key - Name of the object. You can add some random characters
Value - The is data which made of sequence of bytes
Version ID
Metadata

You can add access control at bucket level of object level.
By default buckets are private and all object stored inside them are private.
S3 bucket can be configured to create access log which log all request made to S3 bucket and this can be done to another bucket.
S3 bucket can be used to host static web site. Format of url is http://<bucketname>.s3-website-<region>.amazonaws.com

S3 Storage

S3 - 99.99% availability, 11 9s durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain loss of 2 facilities concurrently
S3 - IA - Here you charged retrieval fee. 99.99% availability, 11 9s durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain loss of 2 facilities concurrently
Reduced Redundancy Storage - 99.99% availability, 99.99% durability, suitable for files which can be reproduced in case of loss. Concurrent fault tolerance 1 facility.
Glacier - Used for data archival, may need 3-5 hrs to retrieve data. 11 9s durability. It has retrieval fee.

S3 Charges

Storage
Request
Storage management price - When you tag object, Amazon change based on per tag basis.
Data transfer fee - When you replicate data or migrate from one region to other
Transfer Acceleration - It takes advantage of amazon cloud front globally distributed edge location. Data transfer between edge location and S3 source over an optimized network path. You can get speed comparison here.

Access

Owner Access
Other AWS Account
Public access

Encryption

Data is transferred using SSL

ASE-256 - Server side encryption with Amazon S3-Managed Key (SSE-S3)
AWS-KMS - Server side encryption AWS KMS-Managed Key (SSE-KMS)
Server Side encryption with customer provided key SSE-C
Client side encryption

Versioning

Stores all version of objects
Once enabled versioning cannot be disabled only suspended
Versioning MFA delete capability uses multi factor authentication

Cross Region Replication

Versioning must be enabled on both source and destination bucket
Files in the existing bucket are not replicated automatically. All subsequent replicated files will be replicated automatically.
You cannot replicated to multiple bucket.
You cannot replicate to the same region.
Delete marker are replicated but deleting individual version or delete marker are not replicated.

Life Cycle management

Life cycle rule will help you manage your storage costs by controlling the lifecycle of your object. You can create lifecycle rules to automatically transition your objects to standard IA, achieve them to Glacier storage class and remove them after a specified period of time. You can use life cycle rules to manage all versions of your object.

Can be used in conjunction with version
Can be applied to current version or previous version
Transition to IA - min 128 kb and 30 days after creation
Archive to Glacier - 30 day after IA, or if doing directly from standard then 1 day after creation.
You can expire current version or permanently delete previous versions

Content Delivery Network

Cloud front is a global content delivery network service that securely delivers data to users with low latency and high transfer speed. Cloud front also works with non AWS origin server.

Edge location - Content will be cached here. This is separate from region or az
Origin - S3 bucket, EC2 instance, ELB, Route 53
Distribution - Name given to CDN which consists of collection of Edge location.

Web Distribution - Typically used for website
RTMP - Used for media streaming

Edge location are not just for read, you can even write to Edge loction
Object are cached for life of TTL(time to live). Expiring before TTL is possible but cost extra.
You can have multiple origins (like S3 buckets etc) in a Cloud front distribution
You can have multiple behavior like path pattern to particular origin etc
Configure error page
Geo restriction setting, whitelist or black list countries
Invalidate which removes from edge location. Less expensive would be to use versioned object or directory name.

Storage Gateway

File Gateway (NFS)
Volume Gateway (iSCSI) - Data written to disk are asynchronously backed up as point in time snapshot and stored in cloud as EBS snapshot. Snapshot are incremental which are also compressed to minimize storage charge. 1gb - 16TB

Stored Volume
Cache Volume

Tape Gateway (VTL)

Transfer Acceleration

This utilizes cloud front edge network to accelerate your uploads to S3. When you enable transfer acceleration for a bucket, you get a distinct url (<bucketname>.s3-accelerate.amazonaws.com) to upload directly to edge location which will then transfer that file to S3 bucket.

Static Website Hosting

Dec 16, 2017

VPC

Amazon Virtual Private Cloud lets you provision a logical section of the AWS where you can launch AWS resources in a virtual network that you define. You have complete control over your VPC including selection of ip range (IPv4 CIDR block), creation of subnets, configuration of route table and network gateway. Its logically isolated from other virtual network in AWS cloud.

When you create VPC it automatically creates following

Route table

It will create Main Route table in the VPC. You will not be able to delete Main route table until. This gets deleted automatically when you delete VPC
Main route table will have a local target route with destination of the VPC IPv4 CIDR and also IPv6 if you selected IPv6 CIDR block when you created VPC
Any subnet which you will create and not associate explicitly with any route table will automatically be associated to Main route table.

Network ACLs

A default Network ACL will be created which you cannot delete.
Default Network ACL will allow all inbound and outbound traffic. You have option to changing it to deny or modify and rules in it.

Security group

Default VPC security group will be created
By default it will allow all outbound traffic and allow no inbound traffic and allow instances associated with this SG to talk to each other.
You can also edit security group rule by adding, removing or updating

Using VPC peering you can connect one VPC with other via direct network route using private IP address. This can be done for other AWS account as well as other VPCs in the same account.

Subnets

A subnetwork or subnet is a logical subdivision of an IP network.[1] The practice of dividing a network into two or more networks is called subnetting.

When you create VPC you specify IPv4 CIDR block (and optional Amazon provided IPv6 CIDR block). You can create subnet in VPC with subset of VPC IPv4 CIDR block (and also for IPv6 if you choose to do so).
Based on subnet's IPv4 CIDR block, you will get IPv4 address in that subnet. Refer following to get count of available IP for specific CIDR block. One important thing to note here is that first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance.
By default any resource created in this subnet will not get public IP address. If you want to change this behavior, you will have to enable auto assign public IPv4 address settings.
Subnet will be associated with Main Route table and Default Network ACLs. This can definitely be modified.

Route Table

A route table contains a set of rules, called routes, that are used to determine where network traffic is directed. Each subnet in your VPC is associated with ONLY ONE route table. If you don't explicitly associate you subnet to a route table then its associated to Main route table

Each route in a table specifies a destination CIDR and a target. For example destination 10.0.0.0/16 with target for Local, which means traffic destined for any ip within 10.0.0.0/16 is targeted for local. Similarly to open all internet access you can choose 0.0.0.0/0 (which essentially means any ip) with target internet gateway.
When you add an Internet gateway, an egress-only Internet gateway, a virtual private gateway, a NAT device, a peering connection, or a VPC endpoint in your VPC, you must update the route table for any subnet that uses these gateways or connections.
For public subnet (instance to be served as web server) you need to have route with destination 0.0.0.0/0 with target as internet gateway.

Internet Gateway

An Internet gateway serves two purposes: to provide a target in your VPC route tables for Internet-routable traffic, and to perform network address translation (NAT) for instances that have been assigned public IP (IPv4 and IPv6 traffic) addresses. One VPC can only have one Internet Gateway.

NAT Instance

You can use a network address translation (NAT) instance in a public subnet in your VPC to enable instances in the private subnet to initiate outbound IPv4 traffic to the Internet or other AWS services, but prevent the instances from receiving inbound traffic initiated by someone on the Internet.

EC2 instance performs source and destination check which means instance must be source or destination of any traffic it sends or receives. However a NAT instance must be able to send or receive traffic when the source or destination is not itself. Therefor source and destination check must be disabled on NAT instance.

NAT Gateway

You can use a network address translation (NAT) gateway to enable instances in a private subnet to connect to the internet or other AWS services, but prevent the internet from initiating a connection with those instances. For IPv6 use an egress-only Internet gateway.

NAT instance is instance (you create single or multiple) which you have to manage whereas NAT gateway is clustered instances which amazon manages so you don't have to worry about maintaining that. NAT instance sits behind security group where as NAT gateway is outside security group. Both need to be in public subnet which allows internet traffic and need to be added to the route table which is associated to the private subnet. This way you can connect to internet in the resources which are within private subnet. The downside of NAT instance is that all your traffic in private subnet goes through NAT instance, so that's a bottleneck as if its goes down it will impact all the resources within your private subnet. NAT instance can be used to bastion server (meaning it can be used to RDP or SSH servers in private subnet.) where as NAT gateway cannot be. NAT Gateway automatically assign ip address when you create them and amazon manages them. You should have NAT gateway in multiple AZ. You cannot SSH or RDP into nate gateway.

Network ACL

A network access control list (ACL) is a layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets inside VPC.

By default everything is denied when you create NACL.

Each subnet must be associated with NACL, if you don't explicitly associate subnet with NACL it automatically associate it with default VPC.

You can associate NACL with multiple subnet, but a subnet can be associated with single NACL and when you update NACL to vpc, it will remove previous associated NACL.

NACL can be used across multiple AZ where as subnet is in single AZ

ACL contains numbered list of rules that is evaluated in order, starting with lowest numbered rule.

Network ACLs are state less, response to allowed inbound traffic are subject to rules for outbound traffic and vice versa, meaning you need to specify both inbound and outbound rules explicitly. Security Group which acts a firewall for controlling traffic in and out of EC2 instance are statefull.

Security Group you allow but in NACL you can allow or deny

Here are some of the examples of minimum Network ACL rule in order to allow specific operation from subnet.

To Allow ping

Inbound - All ICMP - IPv4 Allow, All Trafic Deny
Outbound - All ICMP - IPv4 Allow, All Trafic Deny

To Allow SSH

Inbound - SSH (22) Allow, All Trafic Deny
Outbound - Custom TCP Rule(1024-65535) (Ephemeral_port) Allow, All Trafic Deny

To Allow SSH from Public subnet to private subnet

Since you cannot directly connect to instance in private subnet, you can create Bastions instance, which can act as jump boxes which you can use to administer (like SSH or RDP) to instances in private subnet

Public Subnet NACL

Inbound - SSH (22) Allow, Custom TCP Rule(1024-65535) (Ephemeral_port) Allow, All Trafic Deny
Outbound - Custom TCP Rule(1024-65535) (Ephemeral_port) Allow, SSH (22) Allow, All Trafic Deny

Private Subnet NACL

Inbound - SSH (22) Allow, All Trafic Deny
Outbound - Custom TCP Rule(1024-65535) (Ephemeral_port) Allow, All Trafic Deny

Allow HTTP Access from subnet

Inbound - Custom TCP Rule(1024-65535) (Ephemeral_port), All Trafic Deny
Outbound - HTTP(80) Allow (or HTTPS(443) for ex running aws s3 ls), All Trafic Deny

Allow HTTP Access to Subnet (instance acting as web server)

Inbound- HTTP(80) Allow, All Trafic Deny
Outbound - Custom TCP Rule(1024-65535) (Ephemeral_port), All Trafic Deny

VPC Flow Log

Its a feature which enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data is stored using Amazon cloud watch Logs. It can be created at 3 level

VPC

Subnet

Network interface level

To set up flow log

you have to define filter (all, accepted, rejected)

role which can perform logs:Create\DescribeLogGroup\Stream, PutLogEvents

Assign Log group

You cannot enable flow logs for VPC that are peered with your VPC unless the peer VPC is in your account
You cannot tag a flow log
After you have created a flow log, you cannot change its configuration, for ex you cannot associate different IAM role with the flow log

Following traffic are not monitored

Traffic generated by instances when they contact Amazon DNS server. If you use your own DNS server, then all traffic to that DNS server is logged.
Traffic generated by windows instance for Amazon windows license activation.
Traffic to and fro 169.254.169.254 for instance meta data
DHCP traffic
Traffic to reserved IP address for default VPC router.

Dec 12, 2017

EC2

EC2 is web service which provide resizable compute capacity in cloud in minutes, allowing you to quickly scale capacity, both up and down, as your compute requirement change.

EC2 Options

OnDemand - Allow you to pay by hr (or by second). No upfront payment or commitment. Application with sort term spike or unpredictable work loads that cannot be interrupted, app being developed for first time
Reserve - You can reserve for 1-3 yr. Price is less than OnDemand. Steady state or predictable usage. Its for a region which cannot be changed but you can change AZ

Standard - Price 75% off on demand
Convertable RI - Price 54% off on demand. You have flexibility of changing some of the attribute of EC2 instance like general purpose to cpu optimized or windows to Linux
Schedule RI

Spot - If you have flexible start and end time. If your bid price is higher than spot price EC2 instance will be provisioned. If spot price goes higher than bid, the the instance will be terminated. Some data processing which can happen 3am in the morning. If you terminate you pay for full price, if AWS terminate because spot price went above bid price you will get hour when it was terminated for free
Dedicated Host - If you don't want multi tenant scenario, like for regulatory requirement, or for licensing which does not support multi tenancy or cloud deployment, can be purchased on demand or Reserved

EC2 Instance Types

D2 Dense storage used for fileservers, data warehousing, Hadoop
R4 Memory optimized for memory intensive app
M4 General purpose app server
C4 Compute optimized, cpu intensive app/dbs
G2 Graphic intensive, video encoding, 3d app streaming
I2 High speed storage, no sql db, data warehousing
F1 Field programmable gate array, hardware acceleration for your code, change underlying hardware to choose your need
T2 Lowest cost general purpose, web server / small db
P2 Graphic general purpose GPU, m/c learning
X1 Memory optimized for SAP HANA/apache spark, extreme memory

Launching EC2

While launching EC2 instance you will be asked to use public (AWS stores) and private key (you stores) pare. You need private key to obtain password for window RDP and for linux you can use that to SSH into your instance. You can use same public key/private key combination for multiple EC2 instances.
For each ec2 instance you get ipv4 (or ipv6) public (and also private for internal use), ip address and DNS which you can use to RDP or SSH.
Termination Protection Will not allow you to terminate instance until you change instance setting
System status check - It just make sure instance is reachable. If this fail there may be issue with infrastructure hosting your instance. You can restart or replace the instance.
Instance Status check - This verifies if instance OS is accepting traffic. If this fail you can restart or change OS configuration.
Security group is a virtual fire wall where you specify what incoming/outgoing is allowed. By default everything is blocked, you need to whitelist what you want to allow.

Elastic Block Store

This allows to create storage volumes and attach them to EC2 instance. You can consider this as disk which is attached to your VM. This is block base storage where you can deploy OS, file system, db where as S2 is object storage which is not suitable for installing OS, db etc. This is placed in specific AZ and is automatically replicated within AZ which protect it from failure of a single component. This cannot be mount to multiple EC2 instances. All EBS mounted on EC2 instance will be in the same AZ.

General Purpose SSD, 3IOPS per gb, with upto 10,000 IOPS
Provisioned IOPS - Designed for I/O intensive app like large relational or No SQL db, use if needed more than 10,000IOPS, it can go upto 20,000IOPS
Magnetic Storage physical spinning disk

Throughput optimized HHD(ST1), Big data, Data warehousing, Log processing, can't be boot volume, frequently accessed sequential data
Cold HDD (SC1)- Lowest cost storage for infrequently accessed workloads, file server, can't be boot volume
Magnetic Standard - Lowest cost per gb and is bootable. Suitable where data access infrequently

RAID

Redundant array of independent disks. You put multiple disk together and that act as single disk to the OS. This is needed when you need more IO than single volume type provide. For ex you have db which is not supported by AWS and you not getting enough IO with default EBS type. In windows you can do this by RDPing into the instance and going to Disk management. Taking Snapshot while instance is running can excluded data held in cache by application and OS. This tend to not matter for single volume however for multiple volume for RAID this can be a problem. This can be solved by freezing the file system, or unmounting RAID array or shut down EC2 instance which is the easiest way.

RAID 0 - Stripped, no redundancy, good performance. If one disk fail you loose everything
RAID 1 - Mirrored, redundancy
RAID 5 Good for read bad for write , AWS does not recommend this.
RAID 10 Stripped and Mirrored, its combination of RAID1 and RAID 0

Volume

You can modify volume like type (standard to iops but not from Magnetic Standard), size
You can create snapshot. While doing this you cannot change encryption type.
You can detach volume from EC2 instance after which you can delete it or attach it to other EC2 instance.
When termination instance root volume will be terminate by default but other EBS volume attached to instance will not be deleted. By default deleting an instance will delete volume until you uncheck delete on termination while provisioning EC2 instance.
Root volume of public AMI cannot be encrypted because encryption key is held within your AWS a/c.
Additional volume on EC2 instance can be encrypted while creating EC2 instance from public AMI.
You can also use third party tool such as bit locker for windows to encrypt root volumn.

Snapshot

You can create volume and update volume type size, availability zone. You cannot encrypt EBS.
You can create AMI, while doing that you can add extra volume, but you cannot encrypt EBS.
By default snapshot are private, but you can change permission to make it public or share it with other AWS account, which can give permission to copy snapshot and create volume from it.
You can copy snapshot to other region or to the same region and you also have option on encrypt the snapshot.
Snapshot of encrypted volume are automatically encrypted. Volumes (event root) restored from encrypted snapshot are encrypted. You can share snapshot but only if it is not encrypted, because encryption key is associated with your account.
Snapshot exist on S3, you will not be able to see that in a bucket. Its a point in time copy of the volume, and are incremental.
First snapshot may take longer. It is advisable to stop instance before taking snapshot however you can take snapshot even when instance is running.
Snapshot has createVolumePermission attribute that you can set to one or more AWS account ID to share it.

AMI

AMI can be created from snapshot or EC2 instance.
You can copy AMI to other region or to the same region and you also have option to encrypt target EBS snapshot.
You can launch EC2 instance from AMI
You can create spot request from AMI
You can delete AMI by Deregistering it.

EBS Vs Instance Store

Some Amazon EC2 instance types come with a form of directly attached, block-device storage known as the instance store. Instance store volume are sometime called Ephemeral storage. Instance store volume cannot be stopped, if underlying host fails, you will loose the data, where as EBS backed instance can be stopped. You will not lose the data on this instance if it is stopped. You can reboot both and you will not lose data. By default both root volume will be deleted on termination, however with EBS volume, you can keep AWS to keep the root device volume. Instance store are less durable and are created from template stored in s3 where as EBS volume is created from snapshot. Instance store cannot be added after EC2 instance is created.

Load Balancer

Virtual app which will spread traffic across your different web server

Classic Load balancer - The AWS Classic Load Balancer (CLB) operates at Layer 4 of the OSI model. What this means is that the load balancer routes traffic between clients and backend servers based on IP address and TCP port. For example, an ELB at a given IP address receives a request from a client on TCP port 80 (HTTP). It will then route that request based on the rules previously configured when setting up the load balancer to a specified port on one of a pool of backend servers. In classic lb you register instances with lb.
Application Load balancer - It operates at layer 7 which means not only you route traffic based on IP address and TCP port, but you can add more configuration based on path etc. In application lb you register instance as targets in a target group.
Network Load Balancer

To Create load balancer you configure following

Load balancer protocol(port), Instance Protocol(port)
Security Group
Health check on EC2 instance (Response timeout, Interval, unhealthy threshold, healthy )
Elastic Load balancer will have public ip address but amazon manages it and you will never get IP as it changes internally. Here you get public dns
Instance monitored by ELB is either in-service or out service
You can have only one subnet from each AZ and you should have alteast two AZ in your lb and all of your subnet should have internet gateway if you creating internet facing lb.

ELB Connection Draining causes the load balancer to stop sending new request to the back end instances when the instances are getting deregistered or become unhealthy, while ensure that inflight requests continues to be served. User can specify max of 1hr (default 300 sec) for the load balancer to keep connection alive before reporting the instance as deregistered.

ELB Session Sticky/Affinity feature enables LB to bind user session to a specific instance. It uses your app session cookie or you can configure ELB to create session cookie ().

Health Check

CPU Credit Usage, CPU SurplusCreditBalance, CPU SurplusCreditsCharged, CPUCreditBalance, CPUUtilization
DiskReadBytes, DiskReadOps, DiskWriteBytes, DiskWriteOps
NetworkIn, NetworkOut, NetworkPacketsIn, NetworkPacketsOut
StatusCheckFailed, StatusCheckFailed_Instance, StatusCheckFailed_System
For custome like RAM utilization etc you need to write code

Cloud Watch

Here you can create dashboard, alarm, event (based on any event it can trigger some other activity), Log (here you can go at app layer and log any event). Standard monitoring is 5 min and for detail (you pay extra) is 1 min. Cloud watch is for monitoring and cloud trail is for auditing.

Cloud watch can manage resources such as EC2 instances, DynamoDB table, RDS DB instances, custom metrics generated by your applications and services and any log files your app generate. You can use cloud watch to gain system wide visibility into resource utilization, app performance, and operation health. You can keep these insights to reach and keep your app running smoothly.

Bootstrap Script

While creating EC2 instance you can specify bootstrap script. Refer following for an example on Linux m/c

#!/bin/bash
sudo su #elevate privilege to root
yum install httpd -y
yum update -y
aws s3 cp s3://rraj-test-bucket /var/www/html/ --recursive
currentDate=`date`
echo $HOSTNAME ": was created on - " $currentDate > /var/www/html/index.html
curl http://www.google.com
service httpd start
chkconfig httpd on

Placement Group

It is a logical grouping of instances within single availability zone. Using placement groups enables app to participate in low latency, 10gbps network. Its recommended for app which benefits for both low network latency and high network throughput or both. It cannot span multiple availability zone. Name of placement group should be unique in your aws a/c. Only certain type of instance can be launched in placement group (computer optimized, GPU, Memory Optimized, Storage Optimized). AWS recommend homogenous instances (instance with same size and same family) within placement group. you can't merge placement group. you can't move existing instance into placement group.

EFS

Supports network file system version 4 protocol
Only pay for storage you use.
It can support thousand of concurrent NFS connections
Data is stored across multiple AZ
EFS is block base storage
Read after write consistency
Can scal upto petabyte
It can connect to multiple EC2 instances

IAM Role

In order to access aws services, you need to configure credential by running aws configure and entering aws Access Key ID, Secret key. Doing this stores these info in .aws folder and anyone who is able to ssh will be able to access key and secret. In order to avoid this you can specify IAM role while creating EC2 instance. You need to make sure you add necessary policies to this role.

AWS Command Line

aws s3 ls

aws ec2 describe-instances
aws ec2 help
on putty hit q to escape if its showing more and you don't want to scroll further
create a user and give s3 admin access. when you run aws configure, use this users secret key and access key which will be stored in .aws folder, so if your ec2 instance is compermised, then someone can gain access to the key. This can be prevented by creating a role for EC2 servrice (as EC2 service will use this role), assign this role policy AmazonS3FullAccess. Now when you create a new EC2 instance assign this role as IAM role or for existing instance click on attach/replace IAM role
Instance Metadata - You can access this from command line from following curl command
curl http://169.254.169.254/latest/meta-data/public-ipv4

curl http://169.254.169.254/latest/meta-data/public-ipv4 > mypublicip.html

Launch Configuration and Auto Scaling

You can increase/decrease group size based on alarm which you set.
Alarm can be set based on average/min/max/sum/samplecount of cpu utilization/disk read/write/network in/Out