Tutorial: Using Multipart Upload With AWS CLI (s3)
Multipart upload allows for the efficient and reliable upload of large objects in a bucket. It works by breaking down an object into smaller parts for parallel uploading.
The feature is both supported by high-level s3 commands and low-level s3api commands. Using multipart upload through s3 commands simplifies the uploading process for users seeking a straightforward method.
For operations requiring more granular control, see Using Multipart Upload With AWS CLI (s3api).
Retrieving the MD5 Checksum Value of Your Object
Before you begin: Install and configure AWS CLI. For more information, see Installing and Configuring AWS CLI. |
An MD5 checksum is a 32-character hexadecimal number that acts as a digital fingerprint for your object. For a single-part upload (usually objects that are 5GB or smaller), the MD5 checksum of the object will automatically match the ETag provided upon successful upload.
That is not the case during the multipart upload process that is automatically initiated for large objects when using high-level commands. This is because large objects are split into smaller parts, of which each are uploaded separately. Once all parts are uploaded, they are combined to form the original object. Because s3 adds up the checksums of each part in a multipart upload to calculate ETags, the returned ETag will not match the original MD5 checksum of the object.
Before initiating a multipart upload using high-level s3 commands, you thus need to calculate the MD5 checksum of your object to ensure its integrity.
-
Navigate to the directory where the large object you want to upload is located.
Request sample$ cd LOCAL_PATH
-
Run the following command to calculate the MD5 checksum of your object:
Request sample$ md5 LARGE_OBJECT_TO_UPLOAD
The MD5 checksum of your object is returned.
Creating and Completing a Multipart Upload With AWS CLI (s3)
Before you begin: Install and configure AWS CLI. For more information, see Installing and Configuring AWS CLI. |
To upload a large file to your bucket and automatically trigger the multipart upload process, use the cp command following this syntax:
$ aws s3 cp LOCAL_PATH/LARGE_OBJECT_TO_UPLOAD s3://BUCKET/ \
--profile YOUR_PROFILE \
--debug \
--acl public-read \
--grant-full-control "id=USER_ID, id=USER_ID" \
--grant-read "id=USER_ID, id=USER_ID" \
--grant-read-acp "id=USER_ID, id=USER_ID" \
--grant-write "id=USER_ID, id=USER_ID" \
--cache-control "no-cache" \
--content-disposition "inline" \
--content-encoding "gzip" \
--content-language "en" \
--content-type text/plain \
--expires "2024-03-01T00:00:00Z" \
--metadata "{\"md5\": \"98ecc26f79229e2388dba73f8f521a95\"}" \
--metadata-directive "REPLACE" \
--expected-size 123456789 \
--website-redirect "/new-page.html" \
--endpoint https://oos.eu-west-2.outscale.com
Some of the attributes used in this request sample are optional but are highly recommended if you want to keep track of the multipart upload process. Such attributes are |
This command contains the following attributes that you need to specify:
-
s3://BUCKET/LARGE_OBJECT_TO_UPLOAD
: The path to the large object you want to upload in the bucket. S3 automatically performs a multipart upload for large objects by splitting them into smaller parts for efficient and reliable upload and then reassembling these parts at the designated location in the bucket. -
(optional)
profile
: The named profile you want to use, created when configuring AWS CLI. For more information, see Installing and Configuring AWS CLI. -
(optional)
debug
: When included, returns the detailed log of the operation that are useful to analyze and troubleshoot issues. In the context of the multipart upload process, it allows you to see how exactly your object was partitioned and then reassembled, as well as the ETags that were assigned to each part. -
(optional)
acl
: The permissions you grant for your bucket (private
|public-read
|public-read-write
|authenticated-read
).-
If you do not specify a permission for your bucket upon creation, it will automatically be set to private.
-
When specifying new permissions, all the previous permissions are replaced. Therefore, you need to specify both the existing permissions that you want to keep and the new permissions that you want to give in a single command.
For more information about existing permissions, see Getting Information About a Bucket ACL and Getting Information About an Object ACL.
-
-
(optional)
grant-full-control
: One or more IDs of users to whom you grant thefull-control
permission. -
(optional)
grant-read
: One or more IDs of users to whom you grant theread
permission. -
(optional)
grant-read-acp
: One or more IDs of users to whom you grant theread-acp
permission. -
(optional)
grant-write
: One or more IDs of users to whom you grant thewrite
permission. -
(optional)
grant-write-acp
: One or more IDs of users to whom you grant thewrite-acp
permission.-
You need to specify S3 user IDs. You can retrieve S3 user IDs via the Listing Your Buckets and Listing the Objects of a Bucket methods.
-
You can also specify user email addresses using the
emailaddress=name@domain.com
format.
-
-
(optional)
cache-control
: How you want the object to be handled in terms of caching (max-age
|max-stale
|min-fresh
|no-cache
|no-store
|no-transform
|only-if-cached
|stale-if-error
). -
(optional)
content-disposition
: How you want the object to be displayed when accessed via a browser ("inline"
|"attachment"
|"attachment; filename="<NAME_OF_DOWNLOADED_FILE>""
).-
inline
: When possible, prompts the browser to display the content within the browser window itself. This is useful for images, PDFs, and other media types that browsers can easily render. -
attachment
: Regardless of file type, prompts the browser to download the content instead of displaying it inline. The file is thus saved locally. -
"attachment; filename="<NAME_OF_DOWNLOADED_FILE>""
: Regardless of file type, prompts the browser to download the content instead of displaying it inline. The file is thus saved locally with the specified filename.
-
-
(optional)
content-encoding
: The encoding format of the object (gzip
|compress
|deflate
|identity
|br
). -
(optional)
content-language
: The language the content of the object is in, in language code (ISO 639 format). -
(optional)
content-type
: The MIME (Multipurpose Internet Mail Extensions) type of the object.An inaccurately set or absent
content-type
attribute can cause objects to be misinterpreted or mishandled by browsers. As a result, you may encounter difficulties accessing or viewing your objects using your preferred browser. -
(optional)
expected-size
: Specifies the expected size of the upload, in bytes. This is useful for objects larger than 50GB since it optimizes part allocation and resource usage during the multipart upload process. -
(optional)
expires
: The date and time, in UTC format, at which you consider the object can no longer be cached, and is considered stale. -
(optional)
metadata
: A map of additional metadata entries you can freely specify.-
key
: The name of the metadata. -
value
: The value of the metadata.
-
As per the provided example, you can store the MD5 checksum value of your object using the |
-
(optional)
metadata-directive
: Whether you want the metadata to be copied from the source object or replaced with the metadata provided in the request (COPY
orREPLACE
). -
(optional)
website-redirect
: If you configured the destination bucket as a website, redirects requests for this object to another object in the same bucket or to an external URL. -
endpoint
: The endpoint corresponding to the Region you want to send the request to. For more information, see Installing and Configuring AWS CLI.
Your large object is uploaded to your bucket with the multipart upload process being automatically enabled in the background. To retrieve information about the object, see Getting Information About the Metadata of an Object.
Related Pages
AWS™ and Amazon Web Services™ are trademarks of Amazon Technologies, Inc or its affiliates in the United States and/or other countries.