Amazon Simple Storage Service (S3) is a secure, durable and highly-scalable cloud storage service. S3 is an easy-to-use object storage service with a simple interface that you can use to store or retrieve any amount of data from anywhere on the web.

In terms of payment, S3 allows you to pay only for the storage you are using which eliminates the capacity planning and capacity constraints which are associated with traditional storage.

Amazon S3 can also be used in conjunction with other AWS services. For example, It can be used with Kinesis as target storage for streamed data. It can also be used to store snapshots of EBS volumes and RDS databases.

Common use cases for Amazon S3 storage include:

  • Backup and archive for on-premises or cloud data
  • Content, media, and software storage and distribution
  • Big data analytics
  • Static website hosting
  • Cloud-native mobile and Internet application hosting
  • Disaster recovery

Amazon S3 provides different types of storage classes for the various used cases like general use, infrequent access and archives. AWS also provides life cycle policies by which you can migrate data to appropriate classes.

Amazon Simple Storage Service (S3) Basics

Buckets

A bucket is a web container or a web folder for storing the objects (files) in Amazon S3. Every S3 object must be contained in a bucket. Buckets are the top-level namespace for the Amazon S3.

The important thing which you must remember that bucket names must be globally unique. That means your bucket name must be globally unique across all the AWS accounts.

S3 data model is a flat structure which means there is no hierarchies or folder within the bucket, but you can create logical hierarchies using the key name prefix, i.e. Folder/object.

The ownership of the S3 bucket is not transferrable. You can create multiple buckets, but you can have up to 100 buckets per account by default. This limit is soft, which means you can contact AWS to increase the limit.

Objects

Objects are the entities or file which you can store in the AWS bucket. Virtually you can save the object of any format in the bucket. The maximum size of the object can be up to 5TB, and a single bucket can store the unlimited number of objects. That means we can say that an S3 bucket can virtually save the infinite number of objects.

Each object consists of a data (file) and the meta-data (data about the file). The data portion of an Amazon S3 object is opaque to Amazon S3. It means that an object’s data is treated as simply a stream of bytes. Amazon S3 doesn’t know or care what type of data you are storing, and the service doesn’t act differently for text data versus binary data.

The metadata associated with the S3 objects is a set of name/value pair which describes the object. Amazon S3 has two types of metadata:

System Metadata:

  • Metadata such as the Last-Modified date is controlled by the system which only S3 can modify.
  • System metadata that the user can control, e.g., the storage class configured for the object.

User Metadata:

  • User-defined metadata can be assigned during uploading the object or after the object has been uploaded.
  • User-defined metadata is stored with the object and is returned when the object is downloaded
  • S3 does not process user-defined metadata.
  • User-defined metadata must begin with the prefix "x-amz-meta", otherwise S3 will not set the key-value pair as you define it

Keys

Every object stored in the bucket is associated with an identifier called Key. You can think of the key as a filename. Size of the key can be up to 1024 bytes of Unicode UTF-8 character, including embedded slashes, backslashes, dots, and dashes.

A key must be unique within a bucket, but different bucket can have objects of the same key. The combination of bucket, key and optional version ID uniquely identifies an object in Amazon S3.

Object URL

Every object in the Amazon S3 is addressed by a unique URL formed by using the web service endpoint, the bucket name, and the object key. For example, the URL for the geeksradar.png file which is stored in a bucket whose name is "images" will be:

http://images.s3.amazonaws.com/geeksradar.png

Data Consistency

S3 achieves high availability by replicating data across multiple servers within Amazon’s data centers.

  • S3 provides read-after-write consistency for PUTS of new objects.
    • For a PUT request, S3 synchronously stores data across multiple facilities before returning SUCCESS.
    • A process writes a new object to S3 and will be immediately able to read the Object
    • A process writes a further object to S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
  • S3 provides eventual consistency for overwrite PUTS and DELETES in all regions.
    • For updates and deletes to Objects, the changes are eventually reflected and not available immediately
    • If a process replaces an existing object and immediately attempts to read it, until the change is fully propagated, S3 might return the prior data.
    • If a process deletes an existing object and immediately tries to read it, until the deletion is fully propagated, S3 might recover the deleted data.
    • If a process deletes an existing object and immediately lists keys within its bucket, until the deletion is fully propagated, S3 might list the deleted object.

Amazon S3 Storage Classes

Amazon S3 Standard (S3 Standard)

Amazon S3 offers a range of storage classes which are suitable for different use cases.

S3 Standard offers high durability, availability, and performance object storage for frequently accessed data. Because it delivers low latency and high throughput, S3 Standard is appropriate for a wide variety of use cases, including cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.

Key Features:

  • Low latency and high throughput performance
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Resilient against events that impact an entire Availability Zone
  • Designed for 99.99% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering) — NEW

The S3 Intelligent-Tiering storage class is designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead. It works by storing objects in two access tiers: one tier that is optimized for frequent access and another lower-cost tier that is optimized for infrequent access.

For a small monthly monitoring and automation fee per object, Amazon S3 monitors access patterns of the objects in S3 Intelligent-Tiering and moves the ones that have not been accessed for 30 consecutive days to the infrequent access tier. If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access tier.

There are no retrieval fees when using the S3 Intelligent-Tiering storage class, and no additional tiering fees when objects are moved between access tiers. It is the ideal storage class for long-lived data with access patterns that are unknown or unpredictable.

Key Features:

  • Same low latency and high throughput performance of S3 Standard
  • Small monthly monitoring and auto-tiering fee
  • Automatically moves objects between two access tiers based on changing access patterns
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Resilient against events that impact an entire Availability Zone
  • Designed for 99.9% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

Amazon S3 Standard-Infrequent Access (S3 Standard-IA)

S3 Standard-IA is for data that is accessed less frequently but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. This combination of low cost and high performance make S3 Standard-IA ideal for long-term storage, backups, and as a data store for disaster recovery files.

Key Features:

  • Same low latency and high throughput performance of S3 Standard
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Resilient against events that impact an entire Availability Zone
  • Data is resilient in the event of one entire Availability Zone destruction
  • Designed for 99.9% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)

S3 One Zone-IA is for data that is accessed less frequently but requires rapid access when needed. Unlike other S3 Storage Classes which store data in a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA.

S3 One Zone-IA is ideal for customers who want a lower-cost option for infrequently accessed data but do not require the availability and resilience of S3 Standard or S3 Standard-IA. It’s a good choice for storing secondary backup copies of on-premises data or easily re-creatable data. You can also use it as cost-effective storage for data that is replicated from another AWS Region using S3 Cross-Region Replication.

Key Features:

  • Same low latency and high throughput performance of S3 Standard
  • Designed for durability of 99.999999999% of objects in a single Availability Zone
  • Designed for 99.5% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

Amazon S3 Glacier (S3 Glacier)

S3 Glacier is a secure, durable, and low-cost storage class for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. To keep costs low yet suitable for varying needs, S3 Glacier provides three retrieval options that range from a few minutes to hours.

You can upload objects directly to S3 Glacier, or use S3 Lifecycle policies to transfer data between any of the S3 Storage Classes for active data (S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA) and S3 Glacier. For more information, visit the Amazon S3 Glacier page. »

Key Features:

  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Data is resilient in the event of one entire Availability Zone destruction
  • Supports SSL for data in transit and encryption of data at rest
  • The low-cost design is ideal for long-term archive
  • Configurable retrieval times, from minutes to hours
  • S3 PUT API for direct uploads to S3 Glacier and S3 Lifecycle management for automatic migration of objects

Features of Amazon S3

Object Lifecycle Management

Amazon S3 provides the feature of Object Lifecycle management which is similar to the automated storage tiering in the traditional IT storage infrastructures. Data can be categories in the three stages, i.e. starting as "hot" (frequently accessed data), moving to "warm" (less frequently accessed) data as it ages, and ending life as "cold" (long term backup or archive) data before eventual deletion.

Amazon S3 lifecycle management significantly reduces your storage cost by automatically transitioning of data from one storage class to another or by deleting the object after a specific period. For example, the life cycle rules for backup might be:

  • Store backup data initially in Amazon S3 Standard.
  • After 30 days, the transition to Amazon Standard-IA.
  • After 90 days, the transition to Amazon Glacier.
  • After 3 years, delete.

Lifecycle configurations are attached to the bucket and can apply to all objects in the bucket or only to objects specified by a prefix.

Encryption

Encryption is essential for all the sensitive data saved in Amazon S3, both in flight and at rest.

Encryption in Transit

Amazon S3 supports Secure Sockets Layer (SSL) API endpoints to encrypt the data in flight. It ensures that data is encrypted during transit using the HTTPS, to and from the Amazon S3.

Encryption at Rest

Amazon supports three types of Server Side Encryption and Client Side Encryption, depends on the use cases.

Server Side Encryption

In Server Side Encryption, Amazon S3 encrypts the object before saving it on disks in its data centers and decrypt it when the objects are downloaded while in Client Side Encryption you can encrypt data client-side and upload the encrypted data to S3. In this case, you manage the encryption process, the encryption keys, and related tools.

SSE-S3 (AWS-Managed Keys)

SSE-S3 is fully integrated encryption solution where The AWS does key management and Key protection. While using SSE-S3 AWS encrypts every object using a single key, and a separate master key encrypts all these keys. S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt the data.

These master keys are issued at least once in a month, with AWS rotating keys. AWS stores all the encrypted data, the encryption key and master keys in separate storages which enhances the level of protection.

SSE-KMS (AWS KMS Keys)

SSE-KMS is similar to SSE-S3, but it uses AWS Key Management Services (KMS) which means Amazon handles your key management and protection for Amazon S3, but keys are managed by you only. SSE-KMS offers several other benefits compared to SSE-S3. Some of them are:

  1. SSE-KMS have the feature of allocating separate permissions to the master keys, which provides an additional layer of security and protection against unauthorized access of the objects in S3.
  2. SSE-KMS allows audit use of key used to prove they are being used correctly, by inspecting logs in AWS CloudTrail
  3. AWS KMS also lets you see any unsuccessful attempts to get data from users who didn't have permissions to decrypt the information.

Keep in mind that for all these you have to pay charges as compared to SSE-S3.

AWS Amazon S3 Key Management Service

SSE-C (Customer Provided Keys)

In SSE-C Encryption keys can be managed and provided by the Customer and S3 manages the encryption, as it writes to disks, and decryption when you access the objects. That can be used when you want to maintain your keys but don't want to manage or implement your encryption library.

SSC-C allows you to encrypt each object and its version with a different key value. Here you will be responsible for maintaining the mapping between the object and the encryption key used.

Amazon S3 does not store the encryption key but stores the randomly salted HMAC values of the key for addressing the future request. Salted HMAC values can't be used to recover the original key value or the object encrypted by that key. That means if you lost the encryption key, you will lose the object.

AWS Amazon S3 SSE-C

Client-Side Encryption

Client-Side encryption means encryption of object before sending it to S3 while decrypting after downloading the object.

You have two options for encrypting the objects:

AWS-KMS managed Keys

The customer can maintain the encryption CMK with AWS KMS and can provide the CMK id to the client to encrypt the data

Client-Side master key

Encryption master keys are entirely maintained at Client-Side. The Master keys and unencrypted data never be sent to Amazon S3. If the master key lost, the encrypted objects will never be decrypted or recovered.

Object Versioning

Amazon S3 provides the feature of Object versioning which means the customer can store the multiple version of the single object in the same bucket. Each of the version is associated with a unique version ID.

The version is incremental which means that on the new version only the changes will be reflected. It helps to minimize the cost of storing the multiple version of the same object.

Versioning helps us to recover data from accidental updation or deletion of the object by allowing us to retrieve the object from any of its previous versions.

Versioning can be turned on at the bucket level, and it will be enabled for all the objects in that bucket. Once turned on it can't be removed from the bucket. The versioning can only be suspended.

MFA Delete

Amazon S3 provides the feature of MFA delete which adds an extra layer of security on the top of bucket versioning. MFA delete requires additional authentication to delete or update the version of the object permanently.

If MFA enabled then user have to by using a temporary access token or One-time password which is generated by the system or the MFA device. The critical thing to remember is that the MFA can be enabled only by the root account.

Pre-signed URL

All the objects in S3 buckets are private by default, which means only the owner can access the objects. However, Amazon S3 provides us with the option to share the object by making it publically access.

Sometimes we may find a situation where we have to share the objects publically but only for specific users and for the particular time interval. In such cases, Pre-signed URLs can be used to share the object. We can create the pre-signed URL of an object by providing security credentials, bucket name, object key, HTTP method, and the expiration time and date. These URLs are valid only for the predefined interval of time.

Pre-signed URLs are widely used on Media sharing platforms to distribute the content between a premium user and regular user. Pre-sign URL can also be used as a protection against the Content Scraping.

Multipart Upload

We can upload the objects of size up to 5 TB on the Amazon S3, but sometimes it may be complicated to upload large files. The issues which we face while uploading large file are Low bandwidth or limited connections.

To overcome this issue, Amazon S3 provides the feature of Multipart Upload API. It allows you to upload large objects as a set of parts, which gives us better network utilization, ability to pause the upload and ability to upload the object where the size is initially unknown.

Multipart is a three-step process which includes initiation, Uploading the parts and completion (or abort). The parts can be uploaded independently in any sequence. Amazon S3 will assemble the parts to create the object.

Multipart generally used for the objects of size more than 100 MB but you must use it when the object size is more than 5 GB. Multipart uploads support 1 to 10000 parts, and each part can be from 5MB to 5GB with last part size allowed to be less than 5MB.

While using low-level APIs, You have to break the objects into smaller parts and keeps track of each part, but when you use high-level API and High-level commands (aws s3 cp, aws s3 mv and aws s3 sync) in AWS CLI, Multipart upload will be automatically performed for large objects.

Cross-Region Replication

Cross-Region Replication is the bucket level feature which allows us to replicate the file in between the Buckets of different regions. If you enable the Cross-region Replication, then your object will asynchronously replicate from the source bucket to the destination bucket.

In the replication process, all the meta-data and permission will also be replicated with the object. One important thing is to remember here is that for enabling replication, versioning must be enabled on both source and destination bucket.

Cross-region Replication Used cases:

  1. It can be used to minimize the latency required to access the object.
  2. In case of compliance requirement where you have to back up data in a specific region.
  3. Operational reasons compute clusters in two different regions that analyze the same set of objects.

Remember that only new objects will replicate after enabling the replication. The pre-existing object will not be replicated. You have to replicate pre-existing items manually by using AWS CLI.

Logging

Amazon S3 provides the bucket level feature Logging by which you can track the access logs of your bucket. Logging is turned off by default but can be easily enabled. If you enabled the logging, then you have chosen the folder location where the logs will be stored. You can store the logs in the same bucket or in a different bucket.

Access Log information can be useful in security and access audits. You can also use access logs to understand the customer base and Amazon S3 bills.

Logs include the following information.

  • Requestor account and IP address
  • Bucket name
  • Request time
  • Action (GET, PUT, LIST, and so forth)
  • Response status or error code (if any)

Event Notification

Amazon S3 event notification can be used to send the notification in case of any action taken on the objects uploaded or stored in S3. This notification can be used to trigger any response actions by any other AWS service. For example, whenever you upload a video file on S3, an event notification will trigger the video transcoding of the uploaded video.

Event notifications can be set up on the bucket level. You can configure the event notification through the AWS console, through REST API or by using an AWS SDK.

Amazon S3 can publish the following events.

  • New Objects created an event
    • Can be enabled for PUT, POST or COPY operations
    • You will not receive event notifications from failed operations
  • Object Removal event
    • Can public delete events for object deletion, version object deletion or insertion of delete marker
    • You will not receive event notifications from automatic deletes from lifecycle policies or failed operations.
  • Reduced Redundancy Storage (RRS) object lost the event
    • Can be used to reproduce/recreate the Object

Notification messages can be sent to Amazon Simple Notification Service (SNS) or Simple Query Service (SQS), or you can invoke a lambda function by sending the notification to the AWS Lambda.

Static Website Hosting

Amazon S3 can be used to host the static website with client-side script. Here static website means the website which only contains the static content and does not require server-side processing.

Static websites have many advantages as compare to a dynamic website, they are fast and can be more secure than a dynamic website. Hosting a static website on Amazon S3 will also provide you with the benefits like security, durability, availability, and scalability of Amazon S3.

Every Amazon S3 bucket has a specific URL, so it is very easy to turn a bucket into a website. To host a website, you have to configure the bucket to host website, then upload the website content and you are good to go.

Steps involved in hosting a website in Amazon S3.

  • Create a bucket with the same name as the desired website hostname.
  • Upload the static files to the bucket.
  • Make all the files public (world readable).
  • Enable static website hosting for the bucket. It includes specifying an Index document and an Error document.
  • The website will now be available at the S3 website URL: <bucket-name>.s3-website-<AWS-region>.amazonaws.com.
  • Create a friendly DNS name in your domain for the website using a DNS CNAME, or an Amazon Route 53 alias that resolves to the Amazon S3 website URL.
  • The website will now be available at your website domain name.

You can easily configure Amazon S3 with AWS Route 53 to host a website on S3 bucket. Here the bucket address will be the domains endpoint.

That's it from the theory part of Amazon S3. In the upcoming posts, I will cover the practical part of Amazon S3. If you think we missed out any of the topic or feature, then feel free to comment below or send us mail....!!!

Reference:

LEAVE A REPLY

Please enter your comment!
Please enter your name here