Hubbry Logo
Amazon S3Amazon S3Main
Open search
Amazon S3
Community hub
Amazon S3
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Amazon S3
Amazon S3
from Wikipedia

Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.[1][2] Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network.[3] Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006,[1][4] then in Europe in November 2007.[5]

Key Information

Technical details

[edit]

Design

[edit]

Amazon S3 manages data with an object storage architecture[6] which aims to provide scalability, high availability, and low latency with high durability.[3] The basic storage units of Amazon S3 are objects which are organized into buckets. Each object is identified by a unique, user-assigned key.[7] Buckets can be managed using the console provided by Amazon S3, programmatically with the AWS SDK, or the REST application programming interface. Objects can be up to five terabytes in size.[8][9] Requests are authorized using an access control list associated with each object bucket and support versioning[10] which is disabled by default.[11] Since buckets are typically the size of an entire file system mount in other systems, this access control scheme is very coarse-grained. In other words, unique access controls cannot be associated with individual files.[citation needed] Amazon S3 can be used to replace static web-hosting infrastructure with HTTP client-accessible objects,[12] index document support, and error document support.[13] The Amazon AWS authentication mechanism allows the creation of authenticated URLs, valid for a specified amount of time. Every item in a bucket can also be served as a BitTorrent feed. The Amazon S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This can drastically reduce the bandwidth cost for the download of popular objects. A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in data mining operations.[14] There are various User Mode File System (FUSE)–based file systems for Unix-like operating systems (for example, Linux) that can be used to mount an S3 bucket as a file system. The semantics of the Amazon S3 file system are not that of a POSIX file system, so the file system may not behave entirely as expected.[15]

Amazon S3 storage classes

[edit]

Amazon S3 offers nine different storage classes with different levels of durability, availability, and performance requirements.[16]

  • Amazon S3 Standard is the default. It is general purpose storage for frequently accessed data.
  • Amazon S3 Express One Zone is a single-digit millisecond latency storage for frequently accessed data and latency-sensitive applications. It stores data only in one availability zone.[17]

The Amazon S3 Glacier storage classes above are distinct from Amazon Glacier, which is a separate product with its own APIs.

File size limits

[edit]

An object in S3 can be between 0 bytes and 5 TB. If an object is larger than 5 TB, it must be divided into chunks prior to uploading. When uploading, Amazon S3 allows a maximum of 5 GB in a single upload operation; hence, objects larger than 5 GB must be uploaded via the S3 multipart upload API.[18]

Scale

[edit]

As of 2024, S3 stores 400 trillion objects, serves 150 million requests per second, and peaks at about 1 petabyte per second in bandwidth.[19]

Uses

[edit]

Notable users

[edit]
  • Photo hosting service SmugMug has used Amazon S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.[20]
  • Netflix uses Amazon S3 as their system of record. Netflix implemented a tool, S3mper,[21] to address the Amazon S3 limitations of eventual consistency.[22] S3mper stores the filesystem metadata: filenames, directory structure, and permissions in Amazon DynamoDB.[23]
  • Reddit is hosted on Amazon S3.[24]
  • Bitcasa,[25] and Tahoe-LAFS-on-S3,[26] among others, use Amazon S3 for online backup and synchronization services. In 2016, Dropbox stopped using Amazon S3 services and developed its own cloud server.[27][28]
  • Swiftype's CEO has mentioned that the company uses Amazon S3.[29]

S3 API and competing services

[edit]

The broad adoption of Amazon S3 and related tooling has given rise to competing services based on the S3 API. These services use the standard programming interface but are differentiated by their underlying technologies and business models.[30] A standard interface enables better competition from rival providers and allows economies of scale in implementation, among other benefits.[31] Users are not required to go directly to Amazon as several storage providers such as Cloudian, Backblaze B2, Wasabi  offer S3-compatible storage with options of on-premises and private cloud deployments.[32]

History

[edit]
At AWS Summit 2013 NYC, CTO Werner Vogels announces 2 trillion objects stored in S3.

Amazon Web Services introduced Amazon S3 in 2006.[33][34]

Date Number of Items Stored
October 2007 10 billion[35]
January 2008 14 billion[35]
October 2008 29 billion[36]
March 2009 52 billion[37]
August 2009 64 billion[38]
March 2010 102 billion[39]
April 2013 2 trillion[40]
March 2021 100 trillion[41]
March 2023 280 trillion[42]
November 2024 400 trillion[43]

In November 2017, AWS added default encryption capabilities at bucket level.[44]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service provided by Amazon Web Services (AWS) that enables users to store and retrieve any amount of data at any time from anywhere on the web. Designed for scalability without the need for provisioning storage capacity, Amazon S3 organizes data as objects within containers called buckets, allowing for virtually unlimited storage and automatic scaling to handle high volumes of requests. It supports key features such as data versioning to preserve multiple variants of objects, fine-grained access controls through bucket policies, AWS Identity and Access Management (IAM), and access control lists (ACLs), as well as multiple storage classes tailored to different access patterns and cost requirements. Additionally, S3 integrates encryption at rest by default, server-side and client-side encryption options, and comprehensive auditing tools to ensure data security and compliance. Among its notable benefits, Amazon S3 delivers 99.999999999% (11 9's) durability over a given year by automatically replicating across multiple devices and facilities within a , alongside 99.99% for the S3 Standard storage class. Its pay-as-you-go model eliminates upfront costs, making it cost-effective for diverse workloads, while its performance supports an average of over 150 million requests per second globally as of December 2024. These attributes have made S3 a foundational service for building data lakes, enabling backup and restore operations, disaster recovery, archiving, and powering generative AI applications, as utilized by organizations such as , Ancestry, , and . As of March 2025, Amazon S3 stores over 400 trillion objects and manages exabytes of , underscoring its role in supporting cloud-native applications, mobile apps, and .

Overview

Introduction

Amazon Simple Storage Service (Amazon S3) is a scalable service offered by (AWS) that allows users to store and retrieve any amount of data from anywhere on the web using a simple web services interface. This service is designed for developers and IT teams to upload, organize, and access data as discrete objects within storage containers called buckets, each object identified by a , eliminating the need to manage complex infrastructure. Launched by AWS on March 14, , Amazon S3 pioneered -based , providing a foundational building block for modern applications. As of 2025, Amazon S3 stores over 400 trillion objects comprising exabytes of and processes an average of 150 million requests per second. It also supports up to 1 petabyte per second in bandwidth to handle massive transfer demands.

Key Characteristics

Amazon S3 is designed for elastic scalability, automatically expanding and contracting to accommodate unlimited amounts of data without the need for users to provision storage capacity in advance. This capability ensures seamless handling of varying workloads, from small datasets to petabyte-scale storage, as the service manages resource allocation dynamically behind the scenes. A core attribute of Amazon S3 is its pay-as-you-go pricing model, which charges users solely for the resources consumed, including storage volume, requests, , and outbound data transfer, with no minimum fees or long-term commitments required. This approach aligns costs directly with usage patterns, making it economical for both intermittent and continuous data storage needs. Amazon S3 provides high through low-latency data access, facilitated by integration with AWS's global edge locations for optimized content delivery and multi-AZ replication that ensures consistent availability across Availability Zones. These features enable rapid read and write operations, supporting applications that demand quick response times without performance degradation at scale. Data in Amazon S3 is organized using buckets as top-level logical containers, each serving as a globally unique for storing objects, which are the fundamental units of data with a maximum size of 5 terabytes. Objects are addressed via unique keys within a flat , allowing flexible organization through prefixes that mimic hierarchical structures without imposing a true folder . Lifecycle management in Amazon S3 enables automated policies that transition objects between storage tiers based on predefined rules, such as age or access frequency, to optimize costs and storage efficiency over time. These rules can also handle object expiration, ensuring data is retained only as long as necessary while complying with retention requirements. Complementing these characteristics, Amazon S3 is engineered for exceptional , targeting 99.999999999% (11 ) over a given year through redundant storage across multiple facilities.

Technical Architecture

Design Principles

Amazon S3 employs an object-based storage model, where data is stored as discrete, immutable objects rather than files within a traditional . Each object consists of a key (a ), the data itself (up to 5 terabytes in size), and associated metadata in the form of name-value pairs that describe the object for management and retrieval purposes. This flat design eliminates the need for directories or folders, using key prefixes to simulate if desired, which simplifies and avoids the complexities of management. Objects are immutable, meaning any modification requires uploading a new object with an updated key or version, ensuring in a distributed environment. The architecture of Amazon S3 is fundamentally distributed to achieve high and reliability, with data automatically replicated across multiple devices within a single facility and further across multiple Availability Zones (AZs) for redundancy. Availability Zones are isolated locations engineered with independent power, cooling, and networking to minimize correlated failures, and S3 spreads objects across at least three AZs (except for one-zone storage classes) to protect against facility-wide outages. An elastic repair mechanism proactively detects and mitigates failures, such as disk errors, by re-replicating data to healthy storage, scaling operations proportionally to the total data volume stored. This cell-based design confines potential issues, like software updates or hardware faults, to small partitions of the system, limiting the and maintaining overall service availability. Amazon S3 provides a ful interface for all operations, leveraging standard HTTP methods to ensure simplicity, interoperability, and ease of integration with web-based applications and tools. Core operations include PUT for uploading objects, GET for retrieving them, and DELETE for removal, all authenticated via AWS Signature Version 4 to secure requests over . This design adheres to REST principles, treating buckets and objects as resources addressable via URLs, which enables stateless interactions and compatibility with a wide range of clients without requiring proprietary protocols. As a dedicated service, Amazon S3 intentionally avoids server-side processing capabilities, focusing exclusively on durable and retrieval while delegating any computational needs to complementary AWS services. This allows S3 to optimize for storage efficiency and scalability, integrating seamlessly with services like for event-driven processing or Amazon EC2 for custom compute workloads triggered by S3 events. Since December 2020, Amazon S3 has implemented a strong across all operations, ensuring that any subsequent read immediately reflects the results of a successful write, overwrite, delete, or metadata update without requiring application changes. This upgrade from the prior for new object writes provides predictable behavior for applications, particularly those involving real-time data access or listings, while preserving the service's high performance and availability.

Storage Classes

Amazon S3 provides multiple storage classes tailored to different access frequencies and performance requirements, allowing users to balance cost efficiency with retrieval needs while maintaining consistent durability across all classes at 99.999999999% (11 nines) over a given year. These classes support data redundancy across multiple Availability Zones (AZs) except for single-AZ options, and most enable seamless transitions via S3 Lifecycle policies. The following table summarizes the key characteristics of each storage class:
Storage ClassPrimary Access PatternsRetrieval TimeDesigned AvailabilitySLA AvailabilityKey Features and Notes
S3 StandardFrequently accessed dataMilliseconds99.99%99.9%Low-latency, high-throughput access; data stored across at least 3 AZs; supports lifecycle transitions.
S3 Intelligent-TieringUnknown or changing access patternsMilliseconds (frequent tiers); varies for infrequent/archive99.9%99%Automatically moves objects between frequent, infrequent, and archive instant access tiers after 30, 90, or 180 days of no access; no retrieval fees; monitoring applies; stored across at least 3 AZs; supports lifecycle transitions.
S3 Express One ZoneLatency-sensitive, frequently accessed data in a single AZSingle-digit milliseconds99.95%99.9%High-performance for demanding workloads; supports up to millions of requests per second; uses directory buckets; single AZ only; no support for lifecycle transitions; introduced in 2023.
S3 Standard-Infrequent Access (IA)Infrequently accessed data needing quick accessMilliseconds99.9%99%Suitable for objects larger than 128 KB stored for at least 30 days; retrieval fees apply; stored across at least 3 AZs; supports lifecycle transitions.
S3 One Zone-Infrequent Access (IA)Infrequently accessed, re-creatable dataMilliseconds99.5%99%Lower redundancy in a single AZ for cost savings; suitable for objects larger than 128 KB; retrieval fees apply; supports lifecycle transitions.
S3 Glacier Instant RetrievalRarely accessed data requiring immediate accessMilliseconds99.9%99%Archival option with low cost; minimum object size of 128 KB; 90-day minimum storage duration; stored across at least 3 AZs; supports lifecycle transitions.
S3 Glacier Flexible RetrievalRarely accessed data for or disaster recoveryMinutes to hours (expedited, standard, bulk options)99.99%99.9%Retrieval flexibility with free bulk options; 90-day minimum storage; stored across at least 3 AZs; supports lifecycle transitions.
S3 Glacier Deep ArchiveVery rarely accessed long-term archival data12–48 hours (standard); 48–72 hours (bulk)99.99%99.9%Lowest-cost storage for compliance or ; 180-day minimum storage; stored across at least 3 AZs; supports lifecycle transitions.
S3 Lifecycle policies enable automated management by defining rules to transition objects between storage classes based on age, access patterns, or other criteria, such as moving from S3 Standard to S3 Glacier Deep Archive after 365 days of storage. These policies apply to buckets and can include expiration rules to delete objects after a specified period, optimizing storage without manual intervention.

Limits and Scalability

Amazon S3 imposes specific limits on object sizes to ensure efficient storage and retrieval. Individual objects can range from 0 bytes up to a maximum of 5 tebibytes (TiB), with multipart uploads enabling the handling of large files by dividing them into parts ranging from 5 mebibytes (MiB) to 5 gibibytes (GiB), up to a total of 10,000 parts per upload. Bucket creation is limited by default to 10,000 general purpose buckets per AWS account, though this quota can be increased upon request, with support for up to 1 million buckets. Each bucket can store an unlimited number of objects, allowing for virtually boundless data accumulation without predefined caps on object count. Request rates are designed for high throughput, with Amazon S3 supporting at least 3,500 PUT, COPY, , or DELETE requests per second and 5,500 GET or HEAD requests per second per prefix in a . These rates scale horizontally by distributing requests across multiple prefixes, enabling applications to achieve significantly higher —such as 55,000 GET requests per second with 10 prefixes—without fixed upper bounds. At a global level, Amazon S3 handles massive scale through features like cross-region replication for data distribution across multiple AWS Regions and integration with for edge caching, which reduces latency for worldwide access. The service processes an average of over 100 million requests per second while storing more than 350 trillion objects, demonstrating its elastic that automatically adjusts to varying workloads. To maintain performance at scale, Amazon S3 employs automatic partitioning strategies, including sharding of the object into prefixes for even load distribution across underlying . This approach ensures balanced request handling and prevents bottlenecks, with gradual scaling that may involve temporary throttling via HTTP 503 errors during traffic spikes. Amazon S3 also enforces limits on event notification configurations, with a quota of 100 such configurations per bucket, which is not adjustable. This restriction prevents the use of direct per-prefix event notifications in scenarios requiring a large number of prefixes or queues, such as 50,000, as it would exceed the configuration limit. Common workarounds include configuring a single event notification and applying consumer-side filtering, or using producer-side Amazon Simple Notification Service (SNS) messaging with attributes to route events appropriately.

Features and Capabilities

Durability and Availability

Amazon S3 achieves exceptional data through its architecture, which is designed to deliver 99.999999999% (11 9's) of objects over a given year by automatically storing data redundantly across multiple devices and at least three distinct Availability Zones (AZs) within a . This multi-fold replication ensures that the annual risk of due to hardware failure, errors, or disasters is extraordinarily low, with the system engineered to sustain the concurrent loss of multiple facilities without . To maintain this durability, Amazon S3 employs advanced mechanisms, including automatic error correction and verification using checksums to detect and repair issues such as bit rot or corruption. These checksums are computed on and used to validate data at rest, enabling proactive repairs to restore when degradation is identified. Additionally, options like S3 Cross-Region Replication (CRR) allow users to further enhance by asynchronously objects to a different AWS region for disaster recovery. Availability in Amazon S3 varies by storage class but is optimized for high uptime; for example, the S3 Standard class is designed for 99.99% availability over a year, meaning objects are accessible for requests with minimal interruption. In contrast, classes like S3 One Zone-IA, which store data within a single AZ, offer lower designed availability of 99.5% to balance cost and performance needs. These guarantees are backed by the Amazon S3 Service Level Agreement (SLA), which commits to a monthly uptime percentage of at least 99.9% for S3 Standard and similar classes, with service credits provided as compensation: 10% of monthly fees for uptime below 99.9% but at or above 99.0%, 25% for below 99.0% but at or above 95.0%, and 100% for below 95%. For classes like S3 One Zone-IA, the SLA is 99.0%, reflecting their single-AZ design. The uptime is calculated based on error rates in 5-minute intervals, excluding factors like customer-induced issues or force majeure events. Users can monitor object integrity and replication status through built-in features such as S3 Versioning, which preserves multiple versions of objects to enable recovery from overwrites or deletions, and replication metrics available via Amazon CloudWatch for tracking completion and errors in replication jobs. These tools provide visibility into data persistence without requiring manual intervention.

Security and Compliance

Amazon S3 provides robust security features to protect data at rest, in transit, and during access, including , fine-grained access controls, and comprehensive auditing mechanisms. These features are designed to help users meet organizational security requirements and regulatory standards while leveraging AWS-managed infrastructure.

Encryption

Amazon S3 supports multiple options to secure data, ensuring confidentiality against unauthorized access. Server-side (SSE) is applied automatically to objects upon , with three primary variants: SSE-S3 uses keys managed by Amazon S3, SSE-KMS integrates with AWS Service (KMS) for customer-managed keys with additional control and auditing, and SSE-C allows users to provide their own keys for each operation. Client-side , where users encrypt data before using tools like the Amazon S3 Encryption Client or AWS Encryption Library, offers further flexibility for sensitive workloads. Since January 2023, all new S3 buckets have default server-side enabled with SSE-S3 to establish a baseline level of protection without additional configuration. For advanced scenarios, dual-layer server-side encryption with AWS KMS keys (DSSE-KMS) combines S3-managed encryption with a second layer using customer or AWS-managed KMS keys, enhancing for high-stakes applications. In the context of emerging workloads like vector data storage in S3 Vectors, dual-layer incorporates multiple controls for data at rest and in transit, including automatic with AWS-managed keys.

Access Controls

Access to S3 resources is managed through a combination of identity and policy-based mechanisms to enforce least-privilege principles. AWS Identity and Access Management (IAM) policies allow users to define permissions for principals like users, roles, and services, specifying actions such as read, write, or delete on buckets and objects. Here is an example of an AWS IAM identity-based policy that grants read-only access to a single S3 bucket following the principle of least privilege. It allows listing the bucket contents (s3:ListBucket) and retrieving objects (s3:GetObject), with no write or delete permissions. Replace my-example-bucket with your actual bucket name.

json

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-example-bucket" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::my-example-bucket/*" ] } ] }

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-example-bucket" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::my-example-bucket/*" ] } ] }

This policy is the minimal set for read-only access. If versioning is used, add s3:GetObjectVersion. For console navigation, additional permissions like s3:GetBucketLocation may be needed, but they are not required for API/CLI read access. Bucket policies provide resource-level controls directly on S3 buckets, enabling conditions like IP restrictions or time-based access, while access control lists (ACLs) offer legacy object and bucket-level permissions, though AWS recommends transitioning to policies for finer granularity. Note that support for creating new Email Grantee ACLs ended on October 1, 2025. To prevent accidental public exposure, the S3 Block Public Access feature blocks public access at the account, bucket, and access point levels; since April 2023, it is enabled by default for all new buckets, and ACLs are disabled to simplify ownership and reduce misconfiguration risks. In restricted network environments, such as those with corporate firewalls, uploads to Amazon S3 can be facilitated using HTTPS presigned URLs, which operate on port 443 and utilize standard HTTPS traffic that is rarely blocked. Whitelisting the *.amazonaws.com domains is a common approach to enable broader access, or users can generate an HTTPS URL and inspect its host for more targeted configurations. For private connectivity without public internet traversal, AWS VPC endpoints can be utilized.

Auditing and Logging

Amazon S3 offers detailed logging capabilities to track access and operations for security monitoring and incident response. S3 server access logs capture detailed records of requests to buckets and objects, including requester identity, bucket name, request time, and response status, which can be delivered to another S3 bucket for analysis. For API-level auditing, integration with AWS CloudTrail logs management events (like bucket creation) by default and optional data events (like object-level Get or Put requests), providing a comprehensive audit trail of who performed actions, when, and from where. These logs support compliance requirements by enabling forensic analysis and anomaly detection when combined with tools like Amazon Athena for querying.

Compliance Certifications

Amazon S3 adheres to numerous industry standards and regulations through third-party audits and built-in features that facilitate compliance. It holds certifications including SOC 1, SOC 2, and SOC 3 for controls relevant to financial reporting and security, PCI DSS for payment card data handling, HIPAA/HITECH for protected health information, and support for GDPR through data residency and processing controls. To enable write-once-read-many (WORM) storage for retention policies, S3 Object Lock allows users to lock objects for a specified or indefinitely, preventing deletion or modification and helping meet requirements for immutable records in regulations like SEC Rule 17a-4.

Recent Enhancements

In 2025, Amazon S3 introduced S3 Metadata, a fully managed service that automatically generates and maintains queryable tables of metadata for all objects in a , enhancing visibility for security assessments, , and compliance audits by tracking attributes like size, tags, and status without manual processing. This feature supports security use cases such as identifying unprotected objects or monitoring changes over time. In July 2025, Amazon introduced S3 Vectors (preview), the first cloud with native vector support for storing large vector datasets and subsecond query performance, optimized for AI applications.

Pricing Model

Amazon S3 operates on a pay-as-you-go model, charging users only for the resources they consume without minimum fees or long-term commitments. Costs are determined by factors such as the volume and type of storage, number of requests, operations, and outbound data transfers. Pricing varies by AWS region, with the US East (N. ) region serving as a common reference point. Storage costs are tiered based on the selected storage class and volume stored, billed per GB per month. For instance, S3 Standard storage costs $0.023 per GB for the first 50 TB, $0.022 per GB for the next 450 TB, and $0.021 per GB for volumes over 500 TB (as of November 2025), while S3 Glacier Deep Archive offers lower rates at $0.00099 per GB for the first 50 TB. S3 Intelligent-Tiering includes monitoring and automation fees of $0.0025 per 1,000 objects per month in addition to tier-specific storage rates starting at $0.023 per GB for frequent access. These classes, which balance cost and access needs, are detailed further in the storage classes section. Request fees apply to operations like reading or writing objects, with GET requests charged at $0.0004 per 1,000 for S3 Standard and PUT, COPY, , or requests at $0.005 per 1,000. Data transfer fees primarily affect outbound traffic, where the first 100 GB per month to the is free, followed by $0.09 per GB for the next 10 TB (with tiered reductions for larger volumes). Additional charges include retrieval fees for infrequent or archival storage classes to account for the higher operational costs of accessing less frequently used data. For example, S3 Standard-Infrequent Access incurs $0.01 per GB retrieved, S3 Flexible Retrieval charges $0.01 per GB for standard retrieval and $0.0025 per GB for bulk, and S3 Deep Archive retrieval is $0.02 per GB for standard or $0.0025 per GB for bulk (as of November 2025). Minimum storage duration charges may also apply, enforcing 30 days for Standard-IA, 90 days for Flexible Retrieval, and 180 days for Deep Archive to discourage short-term use of low-cost tiers. To optimize costs, Amazon S3 provides tools such as S3 Storage Lens, which offers free basic metrics and customizable dashboards for analyzing storage usage and identifying savings opportunities across buckets and regions. AWS Savings Plans allow eligible customers to commit to usage for discounted rates on S3 requests and data transfers, potentially reducing expenses by up to 72% compared to on-demand . New AWS accounts include a free tier for S3, providing 5 GB of Standard storage, 20,000 GET requests, 2,000 PUT/COPY/POST/LIST requests, 100 DELETE requests, and 100 GB of transfer out to the per month for the first 12 months.

Use Cases and Applications

Common Use Cases

Amazon S3 serves as a reliable platform for operations, providing offsite storage with built-in versioning that enables from accidental deletions or modifications. This feature supports disaster recovery by allowing users to replicate data across regions and integrate with AWS Backup for automated policies that meet recovery time objectives (RTO) and recovery point objectives (RPO). Organizations leverage S3's 99.999999999% (11 9's) to safeguard critical data against hardware failures or site disasters, ensuring minimal during restoration processes. In data lakes and analytics, S3 functions as a centralized repository for storing vast amounts of structured and at petabyte scale, facilitating querying and without upfront schema definitions. It supports tools like Amazon Athena for serverless SQL queries directly on S3 data and for data warehousing, enabling cost-effective processing of logs, IoT streams, and application data. With features like S3 Select for in-storage filtering, users can reduce data transfer costs and accelerate insights from diverse datasets. For archiving and compliance, S3 offers long-term retention through storage classes like S3 Glacier and S3 Glacier Deep Archive, which provide retrieval times ranging from minutes to hours at significantly lower costs than standard storage. S3 Object Lock implements write-once-read-many (WORM) policies to prevent alterations or deletions, ensuring compliance with regulations such as GDPR, HIPAA, and SEC Rule 17a-4. This setup allows organizations to retain data for 7 to 10 years or longer while optimizing costs via lifecycle transitions based on access patterns. Media and content distribution represent another core application, where S3 hosts static websites and serves as scalable storage for images, videos, and audio files. By enabling public bucket policies and integrating with for global edge caching, S3 delivers low-latency content to end-users, supporting high-traffic scenarios like video streaming or e-commerce assets. Its ability to handle millions of requests per second ensures reliable performance for dynamic content delivery without managing servers. S3-compatible buckets also integrate with digital file sale automation platforms, enabling secure delivery of purchased content. Users generate presigned or signed URLs via the S3 API to provide temporary, secure access for customer downloads, typically for one-time use following a purchase. No-code automation tools such as Zapier, Make.com, and Pipedream facilitate this by triggering workflows on sales webhooks from payment processors like Stripe or Gumroad, automatically generating the URL and emailing it to the buyer. For more customized implementations, developers employ SDKs in languages like Node.js or Python to script URL generation and integration. This approach scales effectively with S3's versioning for file management and can incorporate content delivery networks like Cloudflare to enhance download speeds. In and AI workloads, S3 stores datasets for models, including vector embeddings via S3 Vectors, which provide native support for high-dimensional queries with sub-second latency. It accommodates generative AI applications by hosting large-scale datasets and enabling efficient access for frameworks like or . Recent innovations like Amazon S3 Tables, introduced in 2024, optimize tabular storage with integration, improving query performance for and AI pipelines by up to 3x through automated compaction. S3's reference to storage classes helps tailor these uses to infrequent access patterns for cost efficiency.

Notable Users and Examples

NASCAR utilizes Amazon S3 to store and manage its extensive media library, which includes race videos, audio, and images accumulated over decades of motorsport events. The organization migrated a 15-petabyte archive from legacy LTO tapes to S3 in just over one year, leveraging storage classes such as S3 Standard for active high-resolution mezzanine files, S3 Glacier Instant Retrieval for frequently accessed content, and S3 Glacier Deep Archive for long-term retention of proxy files. This setup handles an annual growth of 1.5 to 2 petabytes, enabling cost-effective scalability and rapid retrieval for fan engagement and production needs. The British Broadcasting Corporation (BBC) employed Amazon S3 Glacier to digitize and centralize its 100-year archive of broadcasting content, transitioning from tape-based systems to for improved preservation and . In a 10-month project, the BBC migrated 25 petabytes of data—averaging 120 terabytes per day—to S3 Glacier Instant Retrieval and S3 Intelligent-Tiering, retiring half of its physical infrastructure while reducing operational costs and enhancing data durability. This migration supported the archival of diverse media assets, ensuring long-term integrity without the vulnerabilities of physical tapes. Ancestry leverages Amazon S3 Glacier to efficiently restore and process vast collections of historical images, facilitating AI-driven enhancements for genealogy research. The company handles hundreds of terabytes of such images, using S3 Glacier's improved throughput to complete restorations in hours rather than days, which accelerates the training of AI models for tasks like on digitized records. This capability has enabled Ancestry to deliver higher-quality, searchable historical photos to millions of users, transforming faded or damaged artifacts into accessible family history resources. Netflix relies on Amazon S3 as a foundational component of its global and analytics infrastructure, managing exabyte-scale data lakes to support personalized streaming recommendations and performance optimization. S3 stores petabytes of video assets and user interaction logs, enabling the processing of billions of hours of monthly content delivery across devices while powering real-time analytics on viewer behavior. This architecture allows Netflix to scale storage elastically, handling daily ingestions that contribute to its massive data footprint for machine learning-driven personalization. Airbnb employs Amazon S3 for robust backup and storage of operational data, including and system logs essential for platform reliability and . The company maintains 10 terabytes of user pictures and other static files in S3, alongside daily processing of 50 gigabytes of log data via integrated services like Amazon EMR, ensuring durable retention for disaster recovery and . This implementation supports Airbnb's high-traffic environment by providing scalable, low-latency access to backups without managing on-premises hardware.

Integrations and Ecosystem

AWS Integrations

Amazon S3 integrates closely with AWS compute services to enable efficient data access and processing. (EC2) instances can directly access S3 buckets by attaching IAM roles that grant the necessary permissions, allowing applications to store and retrieve data without embedding credentials. This setup supports use cases like hosting static websites or running data-intensive workloads on EC2. extends this capability through serverless execution, where S3 event notifications—such as object uploads or deletions—trigger Lambda functions to process data automatically, facilitating real-time transformations without managing servers. However, S3 buckets are limited to 100 event notification configurations, which can constrain scalability for applications requiring many notifications; see the Limits and Scalability section for details. For analytics workloads, S3 serves as a foundational data lake storage layer integrated with services like Amazon Athena and Amazon EMR. Athena enables interactive querying of S3 data using standard SQL, eliminating the need for ETL preprocessing or infrastructure management, and supports features like federated queries across data sources. Amazon EMR, on the other hand, treats S3 as a scalable file system via the S3A connector, allowing users to run Apache Hadoop, Spark, and other frameworks directly on S3-stored data for large-scale processing tasks like ETL and machine learning model training. Backup and management integrations enhance S3's operational resilience and efficiency. AWS Backup provides centralized, policy-based protection for S3 buckets, supporting continuous backups for and periodic backups for cost-optimized archival, with seamless integration across other AWS services. Complementing this, S3 Batch Operations allow bulk execution of actions on billions of objects, such as copying, tagging, or invoking functions, streamlining large-scale data management without custom scripting. Networking features ensure secure and performant connectivity to S3. VPC endpoints, specifically gateway endpoints for S3, enable private access from resources within a (VPC) without traversing the public or incurring data transfer fees, improving security and latency. For hybrid environments, AWS Direct Connect facilitates dedicated, private fiber connections from on-premises data centers to S3, bypassing the for consistent, high-bandwidth data transfers. A notable recent advancement is Amazon S3 Tables, launched in 2024, which optimizes S3 for tabular data using the open format and integrates natively with AWS Glue for metadata cataloging and schema evolution, as well as for building and deploying models on Iceberg tables stored in S3. This integration automates tasks like compaction and , enabling analytics engines to query S3 data as managed tables. Access to these integrations is governed by AWS Identity and Access Management (IAM) policies, ensuring fine-grained control over permissions. In July 2025, Amazon announced Amazon S3 Vectors in preview, the first cloud object store with native support for storing and querying large-scale vector datasets for AI applications. It integrates with Amazon Bedrock Knowledge Bases for cost-effective Retrieval-Augmented Generation (RAG), Amazon SageMaker Unified Studio for building generative AI apps, and Amazon OpenSearch Service for low-latency vector searches, reducing costs by up to 90% compared to general-purpose storage.

Third-Party Compatibility

Amazon S3's serves as an for , enabling compatibility with various third-party solutions for on-premises and hybrid deployments. , an open-source system, implements the S3 to provide high-performance, scalable storage that mimics S3's behavior for cloud-native applications. Similarly, Ceph's Object Gateway (RGW) supports a RESTful compatible with the core data access model of the Amazon S3 , allowing seamless integration for distributed storage environments. Developers can interact with S3 using official AWS SDKs available in multiple languages, facilitating integration into diverse applications without proprietary dependencies. The AWS SDK for offers APIs for S3 operations, enabling Java-based applications to handle uploads, downloads, and bucket management efficiently. For Python, the Boto3 library provides a high-level interface to S3, supporting tasks like object manipulation and multipart uploads. The AWS SDK for .NET similarly equips .NET developers with libraries for S3 interactions, including asynchronous operations and error handling. Additionally, the AWS (CLI) allows command-line access to S3 for scripting and automation, such as listing objects or syncing directories. S3 integrates with third-party content management systems to serve as a backend for file storage and delivery. leverages S3 through connectors like Amazon AppFlow, which transfers data from to S3 buckets for analytics and archiving. Adobe Experience Platform uses S3 as a source and destination for data ingestion, supporting authentication via access keys or assumed roles to manage files in workflows. S3-compatible buckets facilitate digital file sale automation by generating presigned or signed URLs via the S3 API for temporary, secure customer downloads, enabling one-time access post-purchase. No-code tools such as Zapier, Make.com, and Pipedream allow workflows that trigger on sales webhooks from payment platforms like Stripe or Gumroad to generate these URLs and deliver them to buyers via email. Custom scripts using AWS SDKs in Node.js or Python can also implement this functionality. This scales effectively with S3's versioning for file management and can be combined with CDNs like Cloudflare for accelerated delivery. For large-scale data imports, S3 supports migration tools that bridge external environments to AWS storage. AWS devices enable physical shipment of petabyte-scale data to S3, ideal for offline transfers where network bandwidth is limited. AWS Transfer Family provides managed file transfer protocols (SFTP, , FTP) directly to S3, securing imports from on-premises or legacy systems. S3's support for open table formats enhances interoperability with data analytics ecosystems, particularly through . In 2025, S3 introduced sort and z-order compaction strategies for Iceberg tables, optimizing query by reorganizing data partitions in both S3 Tables and general-purpose buckets via AWS Glue. These enhancements, building on the December 2024 launch of built-in Iceberg support in S3 Tables, allow automatic maintenance to reduce scan times and storage costs in open data lakes.

S3 API

API Overview

The Amazon S3 is a RESTful interface that enables developers to interact with S3 storage through standard HTTP methods such as GET, PUT, POST, and DELETE, using regional endpoints formatted as s3.<region>.amazonaws.com for virtual-hosted-style requests or path-style requests like s3.<region>.amazonaws.com/<bucket-name>. Path-style requests remain supported but are legacy and scheduled for future discontinuation. This structure supports operations across buckets and objects, with key actions including ListBuckets to retrieve a list of all buckets owned by the authenticated user and GetObject to the content and metadata of a specified object from a bucket. Developers typically access the via AWS SDKs, CLI tools, or direct HTTP requests, with recommendations to use SDKs for handling complexities like request signing and error management. Authentication for S3 API requests relies on AWS Signature Version 4, which signs requests using access keys and includes elements like the request , payload hash, and canonicalized resource path to ensure and authenticity. For scenarios requiring temporary access without sharing credentials, presigned URLs can be generated, embedding the in query parameters to grant time-limited permissions for operations like uploading or downloading objects, valid for up to seven days. This mechanism allows secure delegation of access, such as enabling client-side uploads directly to S3 buckets or providing temporary secure customer downloads for one-time access post-purchase in digital file sale automation scenarios. Presigned URLs utilize HTTPS on port 443, the standard secure web port, which is often permitted in restricted network environments where other ports may be blocked, facilitating uploads in corporate firewalls. In such cases, network administrators can inspect the host component of the generated presigned URL to identify specific domains, such as regional subdomains of amazonaws.com, for targeted whitelisting. Advanced features in the S3 include multipart uploads, which break large objects into parts for parallel uploading, initiated via CreateMultipartUpload, followed by individual part uploads and completion with CompleteMultipartUpload, supporting objects up to 5 terabytes. Additionally, Amazon S3 Select, introduced in 2018, allows in-place querying of objects in CSV, , or formats using SQL-like expressions through the SelectObjectContent operation, reducing data transfer costs by retrieving only relevant subsets without full downloads. The supports versioning through operations like PutObject with versioning enabled on the , automatically assigning unique version IDs to objects for preserving multiple iterations and enabling retrieval via GetObject with a versionId . Tagging is managed via dedicated calls such as PutObjectTagging to add key-value metadata tags to objects for and allocation, with limits of up to 10 tags per object and retrieval through GetObjectTagging. In 2025, enhancements to S3 Batch Operations expanded support for processing up to 20 billion objects in jobs for actions like copying, tagging, and invoking Lambda functions, facilitated by on-demand manifest generation for targeted large-scale operations. Further updates in 2025 include the discontinuation of support for Email Grantee Access Control Lists (ACLs) as of October 1, 2025; the limitation of S3 Object Lambda access to existing customers only, effective November 7, 2025; the introduction of Amazon S3 Vectors in preview (announced July 15, 2025) for native storage and querying of vector datasets with subsecond performance for AI applications; and the planned removal of the Owner.DisplayName field from API responses starting November 21, 2025, requiring applications to use canonical user IDs instead.

Competing Services

Google Cloud Storage serves as a primary competitor to Amazon S3, offering similar storage classes such as Standard for frequently accessed data and Nearline for infrequently accessed data with retrieval fees. It supports multi-region replication for and integrates seamlessly with Google Cloud Platform's AI services, including Vertex AI for document summarization and analytics workflows. This integration enables AI-driven data processing directly within the Google ecosystem, providing an alternative for users prioritizing applications. Microsoft Azure Blob Storage competes with S3 through its tiered structure, including for active data, Cool for less frequent access, and for long-term retention. It features strong integration with for and authentication, enhancing enterprise security in hybrid environments. Azure also offers lower egress fees compared to S3 in certain scenarios, appealing to data transfer-heavy workloads. Open-source alternatives provide self-hosted options compatible with the S3 API standard. is an S3-compatible system designed for high-performance, on-premises deployments, supporting distributed architectures without . Ceph offers distributed with S3 compatibility via its Object Gateway, enabling scalable, software-defined storage across clusters for block, file, and object needs. Other notable services include Backblaze B2, which emphasizes low-cost storage without API request fees, making it suitable for budget-conscious backups and archiving. R2 provides zero-egress , eliminating data transfer costs out of the platform while maintaining S3 API compatibility for use cases. Amazon S3 differentiates itself through its extensive ecosystem depth, including deep integrations with AWS services like and EC2, which surpass competitors in breadth for complex cloud-native applications, whereas rivals often excel in pricing simplicity or specialized features like zero egress.

Development History

2006-2010: The API-First Era Without GUI

Amazon Simple Storage Service (S3) was publicly launched on March 14, 2006, marking it as the first major infrastructure service offered by (AWS) and establishing the foundation for cloud-based . The service entered a beta testing phase earlier that year, allowing select developers to experiment with its capabilities before general availability. At launch, S3 provided basic object storage designed for scalability, with a focus on durability rated at 99.999999999% (11 nines) and availability of 99.99%, enabling users to store and retrieve unlimited amounts of data via a simple web services API without upfront infrastructure management. In this initial era, S3 emphasized an API-first approach with no official graphical user interface (GUI). Interactions were limited to API calls using REST, SOAP, and BitTorrent protocols. Developers relied on third-party tools such as S3Fox Organizer and CloudBerry Explorer for management until the AWS Management Console provided web-based access in 2010. Early adoption was driven by developers seeking cost-effective, on-demand storage for internet-scale applications. The August 2006 launch of (EC2) enabled direct access to S3 buckets from EC2 instances, accelerating cloud-native architectures. The service faced challenges, including major outages in February and July 2008 that disrupted access for hours due to software bugs and network issues in the US East region. These incidents prompted enhancements in monitoring, error detection, and failover mechanisms. Despite setbacks, growth was rapid; by the end of the third quarter of 2009, S3 stored over 82 billion objects, processing peak request rates exceeding 100,000 per second.

2010-2012: Foundation of Core Features

From 2010 to 2012, AWS rapidly introduced essential features that formed the core capabilities of S3. Object versioning was added in February 2010 to preserve multiple variants of objects. The AWS Management Console launched in June 2010, providing the first official GUI. Bucket policies and notifications (via Amazon SNS) were introduced in 2010, followed by multipart upload for large objects in November 2010. In 2011, static website hosting and Server-Side Encryption (SSE) with AES-256 were added. Lifecycle rules for object expiration and transitions, along with MFA Delete protection, further enhanced management and security. In 2012, Amazon Glacier was launched as a low-cost archival storage service integrated with S3 via lifecycle policies, offering 99.999999999% durability and retrieval options from minutes to hours at significantly lower cost. These features expanded S3's utility for diverse workloads, including long-term retention and basic security.

2013-2015: Ecosystem Expansion Period

The period from 2013 to 2015 focused on integrating S3 with the growing AWS ecosystem and enhancing security and connectivity. AWS CloudTrail integration in 2013 enabled logging of API activity to S3 buckets. Server-Side Encryption options expanded with customer-provided keys (SSE-C) in 2014 and AWS Key Management Service (SSE-KMS) later that year. Event notifications extended to SQS and Lambda. Cross-Region Replication (CRR) launched in March 2015 for asynchronous copying across regions. VPC Endpoints provided private connectivity. In August 2015, S3 improved consistency to support read-after-write for new objects in all regions, reducing eventual consistency issues for applications. These developments strengthened S3's role in secure, interconnected cloud architectures.

2016-2019: Enterprise Feature Enhancement

Between 2016 and 2019, S3 added enterprise-grade features for analytics, performance, and governance. Object tagging and tag-based lifecycle rules appeared in 2016, along with S3 Transfer Acceleration for faster data transfer over long distances and IPv6 support. S3 Select became generally available in 2018, enabling SQL queries on objects in place for formats like CSV, JSON, and Parquet, reducing data transfer costs. Other additions included S3 Intelligent-Tiering for automatic cost optimization, S3 Inventory for reporting, and S3 Batch Operations (2019) for bulk processing of objects. Same-Region Replication (SRR) launched in 2019. These enhancements supported large-scale analytics, compliance, and efficient management in enterprise environments.

2020-2023: Modernization Period

From 2020 onward, S3 evolved to meet demands for consistency, observability, and advanced data processing. The most impactful update was strong read-after-write consistency for all new object PUTs, overwrites, and deletes across all regions in December 2020, eliminating retry logic for many applications. S3 Storage Lens launched in 2020 for organization-wide usage metrics and recommendations. Multi-Region Access Points (2021) provided a single global endpoint for multi-region data access. S3 Object Lambda (2021) allowed custom code to process data on retrieval. AWS Backup support for S3 was added, and S3 Express One Zone (2023) offered high-performance, low-latency storage in a single Availability Zone for machine learning and analytics workloads. These advancements solidified S3 as a foundational service for modern cloud applications.

Recent Developments (2024–present)

In December 2024, Amazon S3 Tables introduced native Apache Iceberg support for managed tabular data storage optimized for analytics, with schema evolution, time travel, and ACID transactions on petabyte-scale datasets. S3 Metadata reached general availability in January 2025, providing near real-time, queryable metadata for objects across attributes like size, tags, and encryption. In June 2025, S3 Tables added Iceberg compaction with sort and z-order strategies to improve query performance. S3 Batch Operations scaled to handle up to 20 billion objects per job. In July 2025, S3 Vectors entered preview for native vector dataset storage and querying with sub-second similarity searches for AI applications. S3 Tables compaction fees were reduced by up to 90%. In October 2025, S3 Batch Operations added on-demand manifest generation. In November 2025, tags were added to S3 Tables for ABAC and cost allocation, while S3 Object Lambda entered maintenance mode for existing customers only.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.