Recent from talks
Nothing was collected or created yet.
Amazon S3
View on WikipediaThe technical content of this article relies largely or entirely on documentation from Amazon.com. (November 2018) |
Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.[1][2] Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network.[3] Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006,[1][4] then in Europe in November 2007.[5]
Key Information
Technical details
[edit]Design
[edit]Amazon S3 manages data with an object storage architecture[6] which aims to provide scalability, high availability, and low latency with high durability.[3] The basic storage units of Amazon S3 are objects which are organized into buckets. Each object is identified by a unique, user-assigned key.[7] Buckets can be managed using the console provided by Amazon S3, programmatically with the AWS SDK, or the REST application programming interface. Objects can be up to five terabytes in size.[8][9] Requests are authorized using an access control list associated with each object bucket and support versioning[10] which is disabled by default.[11] Since buckets are typically the size of an entire file system mount in other systems, this access control scheme is very coarse-grained. In other words, unique access controls cannot be associated with individual files.[citation needed] Amazon S3 can be used to replace static web-hosting infrastructure with HTTP client-accessible objects,[12] index document support, and error document support.[13] The Amazon AWS authentication mechanism allows the creation of authenticated URLs, valid for a specified amount of time. Every item in a bucket can also be served as a BitTorrent feed. The Amazon S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This can drastically reduce the bandwidth cost for the download of popular objects. A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in data mining operations.[14] There are various User Mode File System (FUSE)–based file systems for Unix-like operating systems (for example, Linux) that can be used to mount an S3 bucket as a file system. The semantics of the Amazon S3 file system are not that of a POSIX file system, so the file system may not behave entirely as expected.[15]
Amazon S3 storage classes
[edit]Amazon S3 offers nine different storage classes with different levels of durability, availability, and performance requirements.[16]
- Amazon S3 Standard is the default. It is general purpose storage for frequently accessed data.
- Amazon S3 Express One Zone is a single-digit millisecond latency storage for frequently accessed data and latency-sensitive applications. It stores data only in one availability zone.[17]
The Amazon S3 Glacier storage classes above are distinct from Amazon Glacier, which is a separate product with its own APIs.
File size limits
[edit]An object in S3 can be between 0 bytes and 5 TB. If an object is larger than 5 TB, it must be divided into chunks prior to uploading. When uploading, Amazon S3 allows a maximum of 5 GB in a single upload operation; hence, objects larger than 5 GB must be uploaded via the S3 multipart upload API.[18]
Scale
[edit]As of 2024, S3 stores 400 trillion objects, serves 150 million requests per second, and peaks at about 1 petabyte per second in bandwidth.[19]
Uses
[edit]Notable users
[edit]- Photo hosting service SmugMug has used Amazon S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.[20]
- Netflix uses Amazon S3 as their system of record. Netflix implemented a tool, S3mper,[21] to address the Amazon S3 limitations of eventual consistency.[22] S3mper stores the filesystem metadata: filenames, directory structure, and permissions in Amazon DynamoDB.[23]
- Reddit is hosted on Amazon S3.[24]
- Bitcasa,[25] and Tahoe-LAFS-on-S3,[26] among others, use Amazon S3 for online backup and synchronization services. In 2016, Dropbox stopped using Amazon S3 services and developed its own cloud server.[27][28]
- Swiftype's CEO has mentioned that the company uses Amazon S3.[29]
S3 API and competing services
[edit]The broad adoption of Amazon S3 and related tooling has given rise to competing services based on the S3 API. These services use the standard programming interface but are differentiated by their underlying technologies and business models.[30] A standard interface enables better competition from rival providers and allows economies of scale in implementation, among other benefits.[31] Users are not required to go directly to Amazon as several storage providers such as Cloudian, Backblaze B2, Wasabi offer S3-compatible storage with options of on-premises and private cloud deployments.[32]
History
[edit]Amazon Web Services introduced Amazon S3 in 2006.[33][34]
| Date | Number of Items Stored |
|---|---|
| October 2007 | 10 billion[35] |
| January 2008 | 14 billion[35] |
| October 2008 | 29 billion[36] |
| March 2009 | 52 billion[37] |
| August 2009 | 64 billion[38] |
| March 2010 | 102 billion[39] |
| April 2013 | 2 trillion[40] |
| March 2021 | 100 trillion[41] |
| March 2023 | 280 trillion[42] |
| November 2024 | 400 trillion[43] |
In November 2017, AWS added default encryption capabilities at bucket level.[44]
See also
[edit]References
[edit]Citations
[edit]- ^ a b "Amazon Web Services Launches "Amazon S3"" (Press release). 2006-03-14. Archived from the original on 2018-11-15. Retrieved 2018-11-14.
- ^ Huang, Dijiang; Wu, Huijun (2017-09-08). Mobile Cloud Computing: Foundations and Service Models. Morgan Kaufmann. p. 67. ISBN 9780128096444. Archived from the original on 2018-11-15. Retrieved 2018-11-15.
- ^ a b "Cloud Object Storage - Store & Retrieve Data Anywhere - Amazon Simple Storage Service". Amazon Web Services, Inc. Archived from the original on 2018-05-17. Retrieved 2018-05-17.
- ^ "5 Key Events in the history of Cloud Computing - DZone Cloud". dzone.com. Archived from the original on 2018-09-29. Retrieved 2018-09-28.
- ^ "Amazon Web Services Offers European Storage for Amazon S3" (Press release). 2007-11-06. Archived from the original on 2018-11-15. Retrieved 2018-11-14.
- ^ "What is Cloud Object Storage? – AWS". Amazon Web Services, Inc. 2019-10-16. Archived from the original on 2018-09-20. Retrieved 2018-07-09.
- ^ "Tech Blog » Starting Websphere in Cloud and saving the data in S3". techblog.aasisvinayak.com. Archived from the original on 2010-03-12.
- ^ "open-guides/og-aws". GitHub. Archived from the original on 2018-01-03. Retrieved 2018-05-17.
- ^ "Error Responses - Amazon Simple Storage Service". docs.aws.amazon.com. Archived from the original on 2017-12-24. Retrieved 2018-05-21.
- ^ "Using versioning in S3 buckets - Amazon Simple Storage Service". Archived from the original on 2022-02-22. Retrieved 2022-02-22.
- ^ "Introduction to Amazon S3 - Amazon Simple Storage Service". docs.aws.amazon.com. Archived from the original on 2018-05-12. Retrieved 2018-05-17.
- ^ "How to use Amazon S3 for Web Hosting". bucketexplorer.com. Archived from the original on 2008-04-08. Retrieved 2008-05-06.
- ^ Amazon Simple Storage Service Archived 2011-02-20 at the Wayback Machine Docs.amazonwebservices.com. Retrieved on 2013-08-09.
- ^ http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html Archived 2014-12-23 at the Wayback Machine Server Access Logging
- ^ "Comparison of S3QL and other S3 file systems". Archived from the original on 2012-08-05. Retrieved 2012-06-29.
- ^ "Cloud Storage Classes – Amazon Simple Storage Service (S3) – AWS". Amazon Web Services, Inc. Archived from the original on 2018-06-13. Retrieved 2018-05-17.
- ^ "Announcing the new Amazon S3 Express One Zone high performance storage class | AWS News Blog". aws.amazon.com. 2023-11-28. Retrieved 2023-12-01.
- ^ "How to Upload Large Files to S3". June 21, 2022. Archived from the original on October 1, 2022. Retrieved June 22, 2022.
- ^ AWS re:Invent 2024 - Dive deep on Amazon S3 (STG302). youtube.com. 5 Dec 2024. Event occurs at 2m.
- ^ "Amazon S3: Show Me the Money". SmugMug Blog. SmugMug. November 10, 2006. Archived from the original on 2017-03-03. Retrieved 2017-03-03.
- ^ "S3mper: Consistency in the Cloud". Archived from the original on 2016-04-24. Retrieved 2016-05-01.
- ^ "Introduction to Amazon S3". Amazon. Archived from the original on 2017-12-25. Retrieved 28 December 2017.
- ^ Hern, Alex (2017-02-02). "Amazon Web Services: the secret to the online retailer's future success". the Guardian. Archived from the original on 2018-05-02. Retrieved 2018-04-23.
- ^ "AWS Case Study: reddit". aws.amazon.com. 2015. Archived from the original on 2015-03-17. Retrieved March 18, 2015.
- ^ "Bitcasa Legal". May 16, 2013. Archived from the original on 2013-06-28. Retrieved 2013-05-16.
- ^ "What is Tahoe-LAFS-on-S3?". August 21, 2012. Archived from the original on 2013-05-06. Retrieved 2012-08-21.
- ^ "The Epic Story of Dropbox's Exodus From the Amazon Cloud Empire". WIRED. Archived from the original on 2018-01-25. Retrieved 2018-04-23.
- ^ "Dropbox saved almost $75 million over two years by building its own tech infrastructure". GeekWire. 2018-02-23. Archived from the original on 2018-04-23. Retrieved 2018-04-23.
- ^ "Swiftype Explains Their Cloud Stack". July 1, 2013. Archived from the original on 2014-12-08. Retrieved 2014-12-08.
- ^ Watters, Audrey (12 July 2010). "Cloud Community Debates, Is Amazon S3's API the Standard? (And Should It Be?)". SAY Media, Inc. Archived from the original on 2013-02-17. Retrieved 19 December 2012.
- ^ Crossroads of Information Technology Standards. Committee on Standards Workshop Planning, Board on Telecommunications and Computer Applications, Commission on Engineering and Technical Systems, National Research Council. Washington, DC: The National Academies Press, 1990. 1990. pp. 36–37. doi:10.17226/10440. ISBN 978-0-309-58171-4. Archived from the original on 2014-03-25. Retrieved 2014-03-25.
{{cite book}}: CS1 maint: others (link) - ^ "How to use S3-compatible storage | TechTarget". Search Storage. Retrieved 2025-08-09.
- ^ Overview of Amazon Web Services, 2018, https://docs.aws.amazon.com/whitepapers/latest/aws-overview/introduction.html Archived 2017-11-18 at the Wayback Machine
- ^ Garfinkel, Simson L. 2007. An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS. Harvard Computer Science Group Technical Report TR-08-07. https://dash.harvard.edu/bitstream/handle/1/24829568/tr-08-07.pdf?sequence=1 Archived 2018-07-29 at the Wayback Machine
- ^ a b Vogels, Werner (2008-03-19). "Happy Birthday, Amazon S3!". All Things Distributed. Archived from the original on 2008-05-09. Retrieved 2008-05-23.
- ^ "Amazon S3 - Busier Than Ever". 2008-10-08. Archived from the original on 2008-10-11. Retrieved 2008-10-09.
- ^ "Celebrating S3's Third Birthday With Special Anniversary Pricing - Amazon Web Services". typepad.com. 31 March 2009. Archived from the original on 2011-07-07. Retrieved 2009-04-01.
- ^ "Amazon's Head Start in the Cloud Pays Off". eweek.com. Archived from the original on January 23, 2013.
- ^ "Amazon S3 Now Hosts 100 Billion Objects". datacenterknowledge.com. 9 March 2010. Archived from the original on 2010-03-12. Retrieved 2010-03-09.
- ^ "Amazon S3 – Two Trillion Objects, 1.1 Million Requests / Second - Amazon Web Services". typepad.com. 18 April 2013. Archived from the original on 30 September 2013. Retrieved 4 October 2013.
- ^ "Celebrate 15 Years of Amazon S3 with 'Pi Week' Livestream Events". amazon.com. 14 March 2021.
- ^ "Celebrate Amazon S3's 17th birthday at AWS Pi Day 2023". amazon.com. 14 March 2023.
- ^ "Adapting to change with data patterns on AWS: The "aggregate" cloud data pattern | AWS Storage Blog". 20 December 2024.
- ^ "AWS re:Invent 2024 - Dive deep on Amazon S3 (STG302)". YouTube. 9 December 2024.
Sources
[edit]- "Server Access Logging". Archived from the original on 2014-12-23. Retrieved 2014-12-23.
- "Amazon S3 Developer Guide". 2006-03-01.
- "Amazon S3 Introduces Storage Pricing Tiers". 2008-10-08.
- "RightScale Ruby library to access Amazon CloudFront, EC2, S3, SQS, and SDB". 2007-10-27. Archived from the original on 2008-11-03. Retrieved 2009-01-07.
External links
[edit]Amazon S3
View on GrokipediaOverview
Introduction
Amazon Simple Storage Service (Amazon S3) is a scalable object storage service offered by Amazon Web Services (AWS) that allows users to store and retrieve any amount of data from anywhere on the web using a simple web services interface.[1] This service is designed for developers and IT teams to upload, organize, and access data as discrete objects within storage containers called buckets, each object identified by a unique key, eliminating the need to manage complex infrastructure.[6] Launched by AWS on March 14, 2006, Amazon S3 pioneered cloud-based object storage, providing a foundational building block for modern cloud applications.[7] As of 2025, Amazon S3 stores over 400 trillion objects comprising exabytes of data and processes an average of 150 million requests per second.[5] It also supports up to 1 petabyte per second in bandwidth to handle massive data transfer demands.[8]Key Characteristics
Amazon S3 is designed for elastic scalability, automatically expanding and contracting to accommodate unlimited amounts of data without the need for users to provision storage capacity in advance. This capability ensures seamless handling of varying workloads, from small datasets to petabyte-scale storage, as the service manages resource allocation dynamically behind the scenes.[1] A core attribute of Amazon S3 is its pay-as-you-go pricing model, which charges users solely for the resources consumed, including storage volume, API requests, data retrieval, and outbound data transfer, with no minimum fees or long-term commitments required. This approach aligns costs directly with usage patterns, making it economical for both intermittent and continuous data storage needs.[9] Amazon S3 provides high performance through low-latency data access, facilitated by integration with AWS's global edge locations for optimized content delivery and multi-AZ replication that ensures consistent availability across Availability Zones. These features enable rapid read and write operations, supporting applications that demand quick response times without performance degradation at scale.[10][11] Data in Amazon S3 is organized using buckets as top-level logical containers, each serving as a globally unique namespace for storing objects, which are the fundamental units of data with a maximum size of 5 terabytes. Objects are addressed via unique keys within a flat namespace, allowing flexible organization through prefixes that mimic hierarchical structures without imposing a true folder system.[12][13] Lifecycle management in Amazon S3 enables automated policies that transition objects between storage tiers based on predefined rules, such as age or access frequency, to optimize costs and storage efficiency over time. These rules can also handle object expiration, ensuring data is retained only as long as necessary while complying with retention requirements.[14] Complementing these characteristics, Amazon S3 is engineered for exceptional durability, targeting 99.999999999% (11 nines) over a given year through redundant storage across multiple facilities.[15]Technical Architecture
Design Principles
Amazon S3 employs an object-based storage model, where data is stored as discrete, immutable objects rather than files within a traditional hierarchical file system. Each object consists of a key (a unique identifier), the data itself (up to 5 terabytes in size), and associated metadata in the form of name-value pairs that describe the object for management and retrieval purposes.[12] This flat namespace design eliminates the need for directories or folders, using key prefixes to simulate hierarchy if desired, which simplifies scalability and avoids the complexities of file system management.[12] Objects are immutable, meaning any modification requires uploading a new object with an updated key or version, ensuring data integrity in a distributed environment.[12] The architecture of Amazon S3 is fundamentally distributed to achieve high fault tolerance and reliability, with data automatically replicated across multiple devices within a single facility and further across multiple Availability Zones (AZs) for redundancy.[16] Availability Zones are isolated locations engineered with independent power, cooling, and networking to minimize correlated failures, and S3 spreads objects across at least three AZs (except for one-zone storage classes) to protect against facility-wide outages.[16] An elastic repair mechanism proactively detects and mitigates failures, such as disk errors, by re-replicating data to healthy storage, scaling operations proportionally to the total data volume stored.[16] This cell-based design confines potential issues, like software updates or hardware faults, to small partitions of the system, limiting the blast radius and maintaining overall service availability.[16] Amazon S3 provides a RESTful interface for all operations, leveraging standard HTTP methods to ensure simplicity, interoperability, and ease of integration with web-based applications and tools. Core operations include PUT for uploading objects, GET for retrieving them, and DELETE for removal, all authenticated via AWS Signature Version 4 to secure requests over HTTPS.[17] This API design adheres to REST principles, treating buckets and objects as resources addressable via URLs, which enables stateless interactions and compatibility with a wide range of clients without requiring proprietary protocols.[18] As a dedicated object storage service, Amazon S3 intentionally avoids server-side processing capabilities, focusing exclusively on durable data storage and retrieval while delegating any computational needs to complementary AWS services. This separation of concerns allows S3 to optimize for storage efficiency and scalability, integrating seamlessly with services like AWS Lambda for event-driven processing or Amazon EC2 for custom compute workloads triggered by S3 events.[1] Since December 2020, Amazon S3 has implemented a strong read-after-write consistency model across all operations, ensuring that any subsequent read immediately reflects the results of a successful write, overwrite, delete, or metadata update without requiring application changes.[19] This upgrade from the prior eventual consistency for new object writes provides predictable behavior for applications, particularly those involving real-time data access or listings, while preserving the service's high performance and availability.[20]Storage Classes
Amazon S3 provides multiple storage classes tailored to different access frequencies and performance requirements, allowing users to balance cost efficiency with retrieval needs while maintaining consistent durability across all classes at 99.999999999% (11 nines) over a given year.[21] These classes support data redundancy across multiple Availability Zones (AZs) except for single-AZ options, and most enable seamless transitions via S3 Lifecycle policies.[14] The following table summarizes the key characteristics of each storage class:| Storage Class | Primary Access Patterns | Retrieval Time | Designed Availability | SLA Availability | Key Features and Notes |
|---|---|---|---|---|---|
| S3 Standard | Frequently accessed data | Milliseconds | 99.99% | 99.9% | Low-latency, high-throughput access; data stored across at least 3 AZs; supports lifecycle transitions.[21] |
| S3 Intelligent-Tiering | Unknown or changing access patterns | Milliseconds (frequent tiers); varies for infrequent/archive | 99.9% | 99% | Automatically moves objects between frequent, infrequent, and archive instant access tiers after 30, 90, or 180 days of no access; no retrieval fees; monitoring applies; stored across at least 3 AZs; supports lifecycle transitions.[21][22] |
| S3 Express One Zone | Latency-sensitive, frequently accessed data in a single AZ | Single-digit milliseconds | 99.95% | 99.9% | High-performance for demanding workloads; supports up to millions of requests per second; uses directory buckets; single AZ only; no support for lifecycle transitions; introduced in 2023.[21] |
| S3 Standard-Infrequent Access (IA) | Infrequently accessed data needing quick access | Milliseconds | 99.9% | 99% | Suitable for objects larger than 128 KB stored for at least 30 days; retrieval fees apply; stored across at least 3 AZs; supports lifecycle transitions.[21][23] |
| S3 One Zone-Infrequent Access (IA) | Infrequently accessed, re-creatable data | Milliseconds | 99.5% | 99% | Lower redundancy in a single AZ for cost savings; suitable for objects larger than 128 KB; retrieval fees apply; supports lifecycle transitions.[21][23] |
| S3 Glacier Instant Retrieval | Rarely accessed data requiring immediate access | Milliseconds | 99.9% | 99% | Archival option with low cost; minimum object size of 128 KB; 90-day minimum storage duration; stored across at least 3 AZs; supports lifecycle transitions.[21][24] |
| S3 Glacier Flexible Retrieval | Rarely accessed data for backup or disaster recovery | Minutes to hours (expedited, standard, bulk options) | 99.99% | 99.9% | Retrieval flexibility with free bulk options; 90-day minimum storage; stored across at least 3 AZs; supports lifecycle transitions.[21][24] |
| S3 Glacier Deep Archive | Very rarely accessed long-term archival data | 12–48 hours (standard); 48–72 hours (bulk) | 99.99% | 99.9% | Lowest-cost storage for compliance or digital preservation; 180-day minimum storage; stored across at least 3 AZs; supports lifecycle transitions.[21][24] |
Limits and Scalability
Amazon S3 imposes specific limits on object sizes to ensure efficient storage and retrieval. Individual objects can range from 0 bytes up to a maximum of 5 tebibytes (TiB), with multipart uploads enabling the handling of large files by dividing them into parts ranging from 5 mebibytes (MiB) to 5 gibibytes (GiB), up to a total of 10,000 parts per upload.[26][27] Bucket creation is limited by default to 10,000 general purpose buckets per AWS account, though this quota can be increased upon request, with support for up to 1 million buckets. Each bucket can store an unlimited number of objects, allowing for virtually boundless data accumulation without predefined caps on object count.[28][29] Request rates are designed for high throughput, with Amazon S3 supporting at least 3,500 PUT, COPY, POST, or DELETE requests per second and 5,500 GET or HEAD requests per second per prefix in a bucket. These rates scale horizontally by distributing requests across multiple prefixes, enabling applications to achieve significantly higher performance—such as 55,000 GET requests per second with 10 prefixes—without fixed upper bounds.[30] At a global level, Amazon S3 handles massive scale through features like cross-region replication for data distribution across multiple AWS Regions and integration with Amazon CloudFront for edge caching, which reduces latency for worldwide access. The service processes an average of over 100 million requests per second while storing more than 350 trillion objects, demonstrating its elastic architecture that automatically adjusts to varying workloads.[1] To maintain performance at scale, Amazon S3 employs automatic partitioning strategies, including sharding of the object namespace into prefixes for even load distribution across underlying infrastructure. This approach ensures balanced request handling and prevents bottlenecks, with gradual scaling that may involve temporary throttling via HTTP 503 errors during traffic spikes.[30] Amazon S3 also enforces limits on event notification configurations, with a quota of 100 such configurations per bucket, which is not adjustable. This restriction prevents the use of direct per-prefix event notifications in scenarios requiring a large number of prefixes or queues, such as 50,000, as it would exceed the configuration limit. Common workarounds include configuring a single event notification and applying consumer-side filtering, or using producer-side Amazon Simple Notification Service (SNS) messaging with attributes to route events appropriately.[27][31]Features and Capabilities
Durability and Availability
Amazon S3 achieves exceptional data durability through its architecture, which is designed to deliver 99.999999999% (11 9's) durability of objects over a given year by automatically storing data redundantly across multiple devices and at least three distinct Availability Zones (AZs) within a region.[15][21] This multi-fold replication ensures that the annual risk of data loss due to hardware failure, errors, or disasters is extraordinarily low, with the system engineered to sustain the concurrent loss of multiple facilities without data loss.[15] To maintain this durability, Amazon S3 employs advanced redundancy mechanisms, including automatic error correction and data integrity verification using checksums to detect and repair issues such as bit rot or corruption.[32][15] These checksums are computed on upload and used to validate data at rest, enabling proactive repairs to restore redundancy when degradation is identified. Additionally, options like S3 Cross-Region Replication (CRR) allow users to further enhance durability by asynchronously copying objects to a different AWS region for disaster recovery.[33] Availability in Amazon S3 varies by storage class but is optimized for high uptime; for example, the S3 Standard class is designed for 99.99% availability over a year, meaning objects are accessible for requests with minimal interruption.[21] In contrast, classes like S3 One Zone-IA, which store data within a single AZ, offer lower designed availability of 99.5% to balance cost and performance needs.[21] These guarantees are backed by the Amazon S3 Service Level Agreement (SLA), which commits to a monthly uptime percentage of at least 99.9% for S3 Standard and similar classes, with service credits provided as compensation: 10% of monthly fees for uptime below 99.9% but at or above 99.0%, 25% for below 99.0% but at or above 95.0%, and 100% for below 95%.[34] For classes like S3 One Zone-IA, the SLA is 99.0%, reflecting their single-AZ design.[34] The uptime is calculated based on error rates in 5-minute intervals, excluding factors like customer-induced issues or force majeure events.[34] Users can monitor object integrity and replication status through built-in features such as S3 Versioning, which preserves multiple versions of objects to enable recovery from overwrites or deletions, and replication metrics available via Amazon CloudWatch for tracking completion and errors in replication jobs.[35][36] These tools provide visibility into data persistence without requiring manual intervention.[37]Security and Compliance
Amazon S3 provides robust security features to protect data at rest, in transit, and during access, including encryption, fine-grained access controls, and comprehensive auditing mechanisms.[38] These features are designed to help users meet organizational security requirements and regulatory standards while leveraging AWS-managed infrastructure.[39]Encryption
Amazon S3 supports multiple encryption options to secure data, ensuring confidentiality against unauthorized access. Server-side encryption (SSE) is applied automatically to objects upon upload, with three primary variants: SSE-S3 uses keys managed by Amazon S3, SSE-KMS integrates with AWS Key Management Service (KMS) for customer-managed keys with additional control and auditing, and SSE-C allows users to provide their own encryption keys for each operation. Client-side encryption, where users encrypt data before upload using tools like the Amazon S3 Encryption Client or AWS Encryption Library, offers further flexibility for sensitive workloads.[40] Since January 2023, all new S3 buckets have default server-side encryption enabled with SSE-S3 to establish a baseline level of protection without additional configuration. For advanced scenarios, dual-layer server-side encryption with AWS KMS keys (DSSE-KMS) combines S3-managed encryption with a second layer using customer or AWS-managed KMS keys, enhancing security for high-stakes applications.[41] In the context of emerging workloads like vector data storage in S3 Vectors, dual-layer security incorporates multiple controls for data at rest and in transit, including automatic encryption with AWS-managed keys.[42]Access Controls
Access to S3 resources is managed through a combination of identity and policy-based mechanisms to enforce least-privilege principles. AWS Identity and Access Management (IAM) policies allow users to define permissions for principals like users, roles, and services, specifying actions such as read, write, or delete on buckets and objects.[43] Here is an example of an AWS IAM identity-based policy that grants read-only access to a single S3 bucket following the principle of least privilege. It allows listing the bucket contents (s3:ListBucket) and retrieving objects (s3:GetObject), with no write or delete permissions.
Replace my-example-bucket with your actual bucket name.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-example-bucket"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::my-example-bucket/*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-example-bucket"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::my-example-bucket/*"
]
}
]
}
s3:GetObjectVersion. For console navigation, additional permissions like s3:GetBucketLocation may be needed, but they are not required for API/CLI read access.[44]
Bucket policies provide resource-level controls directly on S3 buckets, enabling conditions like IP restrictions or time-based access, while access control lists (ACLs) offer legacy object and bucket-level permissions, though AWS recommends transitioning to policies for finer granularity.[45][46] Note that support for creating new Email Grantee ACLs ended on October 1, 2025.
To prevent accidental public exposure, the S3 Block Public Access feature blocks public access at the account, bucket, and access point levels; since April 2023, it is enabled by default for all new buckets, and ACLs are disabled to simplify ownership and reduce misconfiguration risks.[47]
In restricted network environments, such as those with corporate firewalls, uploads to Amazon S3 can be facilitated using HTTPS presigned URLs, which operate on port 443 and utilize standard HTTPS traffic that is rarely blocked. Whitelisting the *.amazonaws.com domains is a common approach to enable broader access, or users can generate an HTTPS URL and inspect its host for more targeted configurations. For private connectivity without public internet traversal, AWS VPC endpoints can be utilized.[48][49]
Auditing and Logging
Amazon S3 offers detailed logging capabilities to track access and operations for security monitoring and incident response. S3 server access logs capture detailed records of requests to buckets and objects, including requester identity, bucket name, request time, and response status, which can be delivered to another S3 bucket for analysis. For API-level auditing, integration with AWS CloudTrail logs management events (like bucket creation) by default and optional data events (like object-level Get or Put requests), providing a comprehensive audit trail of who performed actions, when, and from where.[50] These logs support compliance requirements by enabling forensic analysis and anomaly detection when combined with tools like Amazon Athena for querying.[51]Compliance Certifications
Amazon S3 adheres to numerous industry standards and regulations through third-party audits and built-in features that facilitate compliance. It holds certifications including SOC 1, SOC 2, and SOC 3 for controls relevant to financial reporting and security, PCI DSS for payment card data handling, HIPAA/HITECH for protected health information, and support for GDPR through data residency and processing controls.[52][53] To enable write-once-read-many (WORM) storage for retention policies, S3 Object Lock allows users to lock objects for a specified retention period or indefinitely, preventing deletion or modification and helping meet requirements for immutable records in regulations like SEC Rule 17a-4.[54]Recent Enhancements
In 2025, Amazon S3 introduced S3 Metadata, a fully managed service that automatically generates and maintains queryable tables of metadata for all objects in a bucket, enhancing visibility for security assessments, data governance, and compliance audits by tracking attributes like size, tags, and encryption status without manual processing.[55] This feature supports security use cases such as identifying unprotected objects or monitoring changes over time.[56] In July 2025, Amazon introduced S3 Vectors (preview), the first cloud object storage with native vector support for storing large vector datasets and subsecond query performance, optimized for AI applications.[57]Pricing Model
Amazon S3 operates on a pay-as-you-go pricing model, charging users only for the resources they consume without minimum fees or long-term commitments.[9] Costs are determined by factors such as the volume and type of storage, number of requests, data retrieval operations, and outbound data transfers.[9] Pricing varies by AWS region, with the US East (N. Virginia) region serving as a common reference point.[9] Storage costs are tiered based on the selected storage class and volume stored, billed per GB per month. For instance, S3 Standard storage costs $0.023 per GB for the first 50 TB, $0.022 per GB for the next 450 TB, and $0.021 per GB for volumes over 500 TB (as of November 2025), while S3 Glacier Deep Archive offers lower rates at $0.00099 per GB for the first 50 TB.[9] S3 Intelligent-Tiering includes monitoring and automation fees of $0.0025 per 1,000 objects per month in addition to tier-specific storage rates starting at $0.023 per GB for frequent access.[9] These classes, which balance cost and access needs, are detailed further in the storage classes section.[21] Request fees apply to operations like reading or writing objects, with GET requests charged at $0.0004 per 1,000 for S3 Standard and PUT, COPY, POST, or LIST requests at $0.005 per 1,000.[9] Data transfer fees primarily affect outbound traffic, where the first 100 GB per month to the internet is free, followed by $0.09 per GB for the next 10 TB (with tiered reductions for larger volumes).[9] Additional charges include retrieval fees for infrequent or archival storage classes to account for the higher operational costs of accessing less frequently used data. For example, S3 Standard-Infrequent Access incurs $0.01 per GB retrieved, S3 Glacier Flexible Retrieval charges $0.01 per GB for standard retrieval and $0.0025 per GB for bulk, and S3 Glacier Deep Archive retrieval is $0.02 per GB for standard or $0.0025 per GB for bulk (as of November 2025).[9] Minimum storage duration charges may also apply, enforcing 30 days for Standard-IA, 90 days for Glacier Flexible Retrieval, and 180 days for Deep Archive to discourage short-term use of low-cost tiers.[9] To optimize costs, Amazon S3 provides tools such as S3 Storage Lens, which offers free basic metrics and customizable dashboards for analyzing storage usage and identifying savings opportunities across buckets and regions.[58] AWS Savings Plans allow eligible customers to commit to usage for discounted rates on S3 requests and data transfers, potentially reducing expenses by up to 72% compared to on-demand pricing. New AWS accounts include a free tier for S3, providing 5 GB of Standard storage, 20,000 GET requests, 2,000 PUT/COPY/POST/LIST requests, 100 DELETE requests, and 100 GB of data transfer out to the internet per month for the first 12 months.[59]Use Cases and Applications
Common Use Cases
Amazon S3 serves as a reliable platform for backup and restore operations, providing offsite storage with built-in versioning that enables point-in-time recovery from accidental deletions or modifications. This feature supports disaster recovery by allowing users to replicate data across regions and integrate with AWS Backup for automated policies that meet recovery time objectives (RTO) and recovery point objectives (RPO). Organizations leverage S3's 99.999999999% (11 9's) durability to safeguard critical data against hardware failures or site disasters, ensuring minimal data loss during restoration processes. In data lakes and analytics, S3 functions as a centralized repository for storing vast amounts of structured and unstructured data at petabyte scale, facilitating querying and analysis without upfront schema definitions. It supports tools like Amazon Athena for serverless SQL queries directly on S3 data and Amazon Redshift for data warehousing, enabling cost-effective processing of logs, IoT streams, and application data. With features like S3 Select for in-storage filtering, users can reduce data transfer costs and accelerate insights from diverse datasets. For archiving and compliance, S3 offers long-term retention through storage classes like S3 Glacier and S3 Glacier Deep Archive, which provide retrieval times ranging from minutes to hours at significantly lower costs than standard storage. S3 Object Lock implements write-once-read-many (WORM) policies to prevent alterations or deletions, ensuring compliance with regulations such as GDPR, HIPAA, and SEC Rule 17a-4. This setup allows organizations to retain data for 7 to 10 years or longer while optimizing costs via lifecycle transitions based on access patterns. Media and content distribution represent another core application, where S3 hosts static websites and serves as scalable storage for images, videos, and audio files. By enabling public bucket policies and integrating with Amazon CloudFront for global edge caching, S3 delivers low-latency content to end-users, supporting high-traffic scenarios like video streaming or e-commerce assets. Its ability to handle millions of requests per second ensures reliable performance for dynamic content delivery without managing servers. S3-compatible buckets also integrate with digital file sale automation platforms, enabling secure delivery of purchased content. Users generate presigned or signed URLs via the S3 API to provide temporary, secure access for customer downloads, typically for one-time use following a purchase. No-code automation tools such as Zapier, Make.com, and Pipedream facilitate this by triggering workflows on sales webhooks from payment processors like Stripe or Gumroad, automatically generating the URL and emailing it to the buyer. For more customized implementations, developers employ SDKs in languages like Node.js or Python to script URL generation and integration. This approach scales effectively with S3's versioning for file management and can incorporate content delivery networks like Cloudflare to enhance download speeds.[48][60][61] In big data and AI workloads, S3 stores datasets for machine learning models, including vector embeddings via S3 Vectors, which provide native support for high-dimensional data queries with sub-second latency. It accommodates generative AI applications by hosting large-scale training datasets and enabling efficient access for frameworks like TensorFlow or PyTorch. Recent innovations like Amazon S3 Tables, introduced in 2024, optimize tabular data storage with Apache Iceberg integration, improving query performance for analytics and AI pipelines by up to 3x through automated compaction. S3's reference to storage classes helps tailor these uses to infrequent access patterns for cost efficiency.[62][63][64]Notable Users and Examples
NASCAR utilizes Amazon S3 to store and manage its extensive media library, which includes race videos, audio, and images accumulated over decades of motorsport events. The organization migrated a 15-petabyte archive from legacy LTO tapes to S3 in just over one year, leveraging storage classes such as S3 Standard for active high-resolution mezzanine files, S3 Glacier Instant Retrieval for frequently accessed content, and S3 Glacier Deep Archive for long-term retention of proxy files. This setup handles an annual growth of 1.5 to 2 petabytes, enabling cost-effective scalability and rapid retrieval for fan engagement and production needs.[65] The British Broadcasting Corporation (BBC) employed Amazon S3 Glacier to digitize and centralize its 100-year archive of broadcasting content, transitioning from tape-based systems to cloud storage for improved preservation and accessibility. In a 10-month project, the BBC migrated 25 petabytes of data—averaging 120 terabytes per day—to S3 Glacier Instant Retrieval and S3 Intelligent-Tiering, retiring half of its physical infrastructure while reducing operational costs and enhancing data durability. This migration supported the archival of diverse media assets, ensuring long-term integrity without the vulnerabilities of physical tapes.[66] Ancestry leverages Amazon S3 Glacier to efficiently restore and process vast collections of historical images, facilitating AI-driven enhancements for genealogy research. The company handles hundreds of terabytes of such images, using S3 Glacier's improved throughput to complete restorations in hours rather than days, which accelerates the training of AI models for tasks like handwriting recognition on digitized records. This capability has enabled Ancestry to deliver higher-quality, searchable historical photos to millions of users, transforming faded or damaged artifacts into accessible family history resources.[67] Netflix relies on Amazon S3 as a foundational component of its global content delivery network and analytics infrastructure, managing exabyte-scale data lakes to support personalized streaming recommendations and performance optimization. S3 stores petabytes of video assets and user interaction logs, enabling the processing of billions of hours of monthly content delivery across devices while powering real-time analytics on viewer behavior. This architecture allows Netflix to scale storage elastically, handling daily ingestions that contribute to its massive data footprint for machine learning-driven personalization.[68][69] Airbnb employs Amazon S3 for robust backup and storage of operational data, including user-generated content and system logs essential for platform reliability and analytics. The company maintains 10 terabytes of user pictures and other static files in S3, alongside daily processing of 50 gigabytes of log data via integrated services like Amazon EMR, ensuring durable retention for disaster recovery and business intelligence. This implementation supports Airbnb's high-traffic environment by providing scalable, low-latency access to backups without managing on-premises hardware.[70]Integrations and Ecosystem
AWS Integrations
Amazon S3 integrates closely with AWS compute services to enable efficient data access and processing. Amazon Elastic Compute Cloud (EC2) instances can directly access S3 buckets by attaching IAM roles that grant the necessary permissions, allowing applications to store and retrieve data without embedding credentials.[71] This setup supports use cases like hosting static websites or running data-intensive workloads on EC2. AWS Lambda extends this capability through serverless execution, where S3 event notifications—such as object uploads or deletions—trigger Lambda functions to process data automatically, facilitating real-time transformations without managing servers. However, S3 buckets are limited to 100 event notification configurations, which can constrain scalability for applications requiring many notifications; see the Limits and Scalability section for details.[72][27] For analytics workloads, S3 serves as a foundational data lake storage layer integrated with services like Amazon Athena and Amazon EMR. Athena enables interactive querying of S3 data using standard SQL, eliminating the need for ETL preprocessing or infrastructure management, and supports features like federated queries across data sources.[73] Amazon EMR, on the other hand, treats S3 as a scalable file system via the S3A connector, allowing users to run Apache Hadoop, Spark, and other frameworks directly on S3-stored data for large-scale processing tasks like ETL and machine learning model training.[74] Backup and management integrations enhance S3's operational resilience and efficiency. AWS Backup provides centralized, policy-based protection for S3 buckets, supporting continuous backups for point-in-time recovery and periodic backups for cost-optimized archival, with seamless integration across other AWS services.[75] Complementing this, S3 Batch Operations allow bulk execution of actions on billions of objects, such as copying, tagging, or invoking Lambda functions, streamlining large-scale data management without custom scripting.[76] Networking features ensure secure and performant connectivity to S3. VPC endpoints, specifically gateway endpoints for S3, enable private access from resources within a Virtual Private Cloud (VPC) without traversing the public internet or incurring data transfer fees, improving security and latency.[77] For hybrid environments, AWS Direct Connect facilitates dedicated, private fiber connections from on-premises data centers to S3, bypassing the internet for consistent, high-bandwidth data transfers.[78] A notable recent advancement is Amazon S3 Tables, launched in 2024, which optimizes S3 for tabular data using the open Apache Iceberg format and integrates natively with AWS Glue for metadata cataloging and schema evolution, as well as Amazon SageMaker for building and deploying machine learning models on Iceberg tables stored in S3.[79] This integration automates tasks like compaction and time travel, enabling analytics engines to query S3 data as managed tables. Access to these integrations is governed by AWS Identity and Access Management (IAM) policies, ensuring fine-grained control over permissions. In July 2025, Amazon announced Amazon S3 Vectors in preview, the first cloud object store with native support for storing and querying large-scale vector datasets for AI applications. It integrates with Amazon Bedrock Knowledge Bases for cost-effective Retrieval-Augmented Generation (RAG), Amazon SageMaker Unified Studio for building generative AI apps, and Amazon OpenSearch Service for low-latency vector searches, reducing costs by up to 90% compared to general-purpose storage.[80]Third-Party Compatibility
Amazon S3's API serves as an open standard for object storage, enabling compatibility with various third-party solutions for on-premises and hybrid deployments. MinIO, an open-source object storage system, implements the S3 API to provide high-performance, scalable storage that mimics S3's behavior for cloud-native applications.[81] Similarly, Ceph's Object Gateway (RGW) supports a RESTful API compatible with the core data access model of the Amazon S3 API, allowing seamless integration for distributed storage environments.[82] Developers can interact with S3 using official AWS SDKs available in multiple languages, facilitating integration into diverse applications without proprietary dependencies. The AWS SDK for Java offers APIs for S3 operations, enabling Java-based applications to handle uploads, downloads, and bucket management efficiently.[83] For Python, the Boto3 library provides a high-level interface to S3, supporting tasks like object manipulation and multipart uploads.[84] The AWS SDK for .NET similarly equips .NET developers with libraries for S3 interactions, including asynchronous operations and error handling.[85] Additionally, the AWS Command Line Interface (CLI) allows command-line access to S3 for scripting and automation, such as listing objects or syncing directories. S3 integrates with third-party content management systems to serve as a backend for file storage and delivery. Salesforce leverages S3 through connectors like Amazon AppFlow, which transfers data from Salesforce to S3 buckets for analytics and archiving.[86] Adobe Experience Platform uses S3 as a source and destination for data ingestion, supporting authentication via access keys or assumed roles to manage files in workflows.[87] S3-compatible buckets facilitate digital file sale automation by generating presigned or signed URLs via the S3 API for temporary, secure customer downloads, enabling one-time access post-purchase.[48] No-code tools such as Zapier, Make.com, and Pipedream allow workflows that trigger on sales webhooks from payment platforms like Stripe or Gumroad to generate these URLs and deliver them to buyers via email.[60][88][89][90] Custom scripts using AWS SDKs in Node.js or Python can also implement this functionality.[61][91] This scales effectively with S3's versioning for file management and can be combined with CDNs like Cloudflare for accelerated delivery.[92] For large-scale data imports, S3 supports migration tools that bridge external environments to AWS storage. AWS Snowball devices enable physical shipment of petabyte-scale data to S3, ideal for offline transfers where network bandwidth is limited.[93] AWS Transfer Family provides managed file transfer protocols (SFTP, FTPS, FTP) directly to S3, securing imports from on-premises or legacy systems. S3's support for open table formats enhances interoperability with data analytics ecosystems, particularly through Apache Iceberg. In 2025, S3 introduced sort and z-order compaction strategies for Iceberg tables, optimizing query performance by reorganizing data partitions in both S3 Tables and general-purpose buckets via AWS Glue.[94] These enhancements, building on the December 2024 launch of built-in Iceberg support in S3 Tables, allow automatic maintenance to reduce scan times and storage costs in open data lakes.[63]S3 API
API Overview
The Amazon S3 API is a RESTful interface that enables developers to interact with S3 storage through standard HTTP methods such as GET, PUT, POST, and DELETE, using regional endpoints formatted ass3.<region>.amazonaws.com for virtual-hosted-style requests or path-style requests like s3.<region>.amazonaws.com/<bucket-name>.[17] Path-style requests remain supported but are legacy and scheduled for future discontinuation.[95] This structure supports operations across buckets and objects, with key actions including ListBuckets to retrieve a list of all buckets owned by the authenticated user and GetObject to download the content and metadata of a specified object from a bucket.[96] Developers typically access the API via AWS SDKs, CLI tools, or direct HTTP requests, with recommendations to use SDKs for handling complexities like request signing and error management.[96]
Authentication for S3 API requests relies on AWS Signature Version 4, which signs requests using access keys and includes elements like the request timestamp, payload hash, and canonicalized resource path to ensure integrity and authenticity.[97] For scenarios requiring temporary access without sharing credentials, presigned URLs can be generated, embedding the signature in query parameters to grant time-limited permissions for operations like uploading or downloading objects, valid for up to seven days.[48] This mechanism allows secure delegation of access, such as enabling client-side uploads directly to S3 buckets or providing temporary secure customer downloads for one-time access post-purchase in digital file sale automation scenarios.[48] Presigned URLs utilize HTTPS on port 443, the standard secure web port, which is often permitted in restricted network environments where other ports may be blocked, facilitating uploads in corporate firewalls. In such cases, network administrators can inspect the host component of the generated presigned URL to identify specific domains, such as regional subdomains of amazonaws.com, for targeted whitelisting.[27]
Advanced features in the S3 API include multipart uploads, which break large objects into parts for parallel uploading, initiated via CreateMultipartUpload, followed by individual part uploads and completion with CompleteMultipartUpload, supporting objects up to 5 terabytes.[98] Additionally, Amazon S3 Select, introduced in 2018, allows in-place querying of objects in CSV, JSON, or Parquet formats using SQL-like expressions through the SelectObjectContent operation, reducing data transfer costs by retrieving only relevant subsets without full downloads.[99][100]
The API supports versioning through operations like PutObject with versioning enabled on the bucket, automatically assigning unique version IDs to objects for preserving multiple iterations and enabling retrieval via GetObject with a versionId parameter.[101] Tagging is managed via dedicated calls such as PutObjectTagging to add key-value metadata tags to objects for organization and cost allocation, with limits of up to 10 tags per object and retrieval through GetObjectTagging.[102]
In 2025, enhancements to S3 Batch Operations expanded support for processing up to 20 billion objects in jobs for actions like copying, tagging, and invoking Lambda functions, facilitated by on-demand manifest generation for targeted large-scale operations.[103] Further updates in 2025 include the discontinuation of support for Email Grantee Access Control Lists (ACLs) as of October 1, 2025; the limitation of S3 Object Lambda access to existing customers only, effective November 7, 2025; the introduction of Amazon S3 Vectors in preview (announced July 15, 2025) for native storage and querying of vector datasets with subsecond performance for AI applications; and the planned removal of the Owner.DisplayName field from API responses starting November 21, 2025, requiring applications to use canonical user IDs instead.[104][105][80][106]
