S3 provides secure, durable, highly scalable object based storage. The data is stored across multiple devices and facilities.
- Files can be anywhere from 0byte to 5TB.
- Files are stored in bucket
- You can access bucket with following url https://s3.amazonaws.com/<bucketname> , so the name of bucket should be universal
- When you upload file to S3 bucket, you get 200 status code
- Read after write consistency
- Eventually consistency for overwrite for PUTS and DELETS. This is because object stored across multiple devices and facilities may take time to propagate. Though it may take few sec or millisec to propagate, but at any point data will be atomic meaning you will either get old data or new data.
- S3 is object based storage, which means its suitable for objects like pdf, images etc. It is not for installing OS or DB. Each object consists of
- Key - Name of the object. You can add some random characters
- Value - The is data which made of sequence of bytes
- Version ID
- Metadata
- You can add access control at bucket level of object level.
- By default buckets are private and all object stored inside them are private.
- S3 bucket can be configured to create access log which log all request made to S3 bucket and this can be done to another bucket.
- S3 bucket can be used to host static web site. Format of url is http://<bucketname>.s3-website-<region>.amazonaws.com
S3 Storage
- S3 - 99.99% availability, 11 9s durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain loss of 2 facilities concurrently
- S3 - IA - Here you charged retrieval fee. 99.99% availability, 11 9s durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain loss of 2 facilities concurrently
- Reduced Redundancy Storage - 99.99% availability, 99.99% durability, suitable for files which can be reproduced in case of loss. Concurrent fault tolerance 1 facility.
- Glacier - Used for data archival, may need 3-5 hrs to retrieve data. 11 9s durability. It has retrieval fee.
S3 Charges
- Storage
- Request
- Storage management price - When you tag object, Amazon change based on per tag basis.
- Data transfer fee - When you replicate data or migrate from one region to other
- Transfer Acceleration - It takes advantage of amazon cloud front globally distributed edge location. Data transfer between edge location and S3 source over an optimized network path. You can get speed comparison here.
Access
- Owner Access
- Other AWS Account
- Public access
Encryption
Data is transferred using SSL
- ASE-256 - Server side encryption with Amazon S3-Managed Key (SSE-S3)
- AWS-KMS - Server side encryption AWS KMS-Managed Key (SSE-KMS)
- Server Side encryption with customer provided key SSE-C
- Client side encryption
Versioning
- Stores all version of objects
- Once enabled versioning cannot be disabled only suspended
- Versioning MFA delete capability uses multi factor authentication
Cross Region Replication
- Versioning must be enabled on both source and destination bucket
- Files in the existing bucket are not replicated automatically. All subsequent replicated files will be replicated automatically.
- You cannot replicated to multiple bucket.
- You cannot replicate to the same region.
- Delete marker are replicated but deleting individual version or delete marker are not replicated.
Life Cycle management
Life cycle rule will help you manage your storage costs by controlling the lifecycle of your object. You can create lifecycle rules to automatically transition your objects to standard IA, achieve them to Glacier storage class and remove them after a specified period of time. You can use life cycle rules to manage all versions of your object.
- Can be used in conjunction with version
- Can be applied to current version or previous version
- Transition to IA - min 128 kb and 30 days after creation
- Archive to Glacier - 30 day after IA, or if doing directly from standard then 1 day after creation.
- You can expire current version or permanently delete previous versions
Content Delivery Network
Cloud front is a global content delivery network service that securely delivers data to users with low latency and high transfer speed. Cloud front also works with non AWS origin server.
- Edge location - Content will be cached here. This is separate from region or az
- Origin - S3 bucket, EC2 instance, ELB, Route 53
- Distribution - Name given to CDN which consists of collection of Edge location.
- Web Distribution - Typically used for website
- RTMP - Used for media streaming
- Edge location are not just for read, you can even write to Edge loction
- Object are cached for life of TTL(time to live). Expiring before TTL is possible but cost extra.
- You can have multiple origins (like S3 buckets etc) in a Cloud front distribution
- You can have multiple behavior like path pattern to particular origin etc
- Configure error page
- Geo restriction setting, whitelist or black list countries
- Invalidate which removes from edge location. Less expensive would be to use versioned object or directory name.
Storage Gateway
- File Gateway (NFS)
- Volume Gateway (iSCSI) - Data written to disk are asynchronously backed up as point in time snapshot and stored in cloud as EBS snapshot. Snapshot are incremental which are also compressed to minimize storage charge. 1gb - 16TB
- Stored Volume
- Cache Volume
- Tape Gateway (VTL)
Transfer Acceleration
This utilizes cloud front edge network to accelerate your uploads to S3. When you enable transfer acceleration for a bucket, you get a distinct url (<bucketname>.s3-accelerate.amazonaws.com) to upload directly to edge location which will then transfer that file to S3 bucket.
Static Website Hosting