Tutorial: Using Multipart Upload With AWS CLI (s3)

Multipart upload allows for the efficient and reliable upload of large objects in a bucket. It works by breaking down an object into smaller parts for parallel uploading.

The feature is both supported by high-level s3 commands and low-level s3api commands. Using multipart upload through s3 commands simplifies the uploading process for users seeking a straightforward method.

For operations requiring more granular control, see Using Multipart Upload With AWS CLI (s3api).

Retrieving the MD5 Checksum Value of Your Object

Before you begin: Install and configure AWS CLI and set up your profile. For more information, see Installing and Configuring AWS CLI.

An MD5 checksum is a 32-character hexadecimal number that acts as a digital fingerprint for your object. For a single-part upload (usually objects that are 5GB or smaller), the MD5 checksum of the object will automatically match the ETag provided upon successful upload.

That is not the case during the multipart upload process that is automatically initiated for large objects when using high-level commands. This is because large objects are split into smaller parts, of which each are uploaded separately. Once all parts are uploaded, they are combined to form the original object. Because s3 adds up the checksums of each part in a multipart upload to calculate ETags, the returned ETag will not match the original MD5 checksum of the object.

Before initiating a multipart upload using high-level s3 commands, you thus need to calculate the MD5 checksum of your object to ensure its integrity.

  1. Navigate to the directory where the large object you want to upload is located.

    Request sample
    $ cd LOCAL_PATH
  2. Run the following command to calculate the MD5 checksum of your object:

    Request sample
    $ md5 LARGE_OBJECT_TO_UPLOAD

The MD5 checksum of your object is returned.

Creating and Completing a Multipart Upload With AWS CLI (s3)

Before you begin: Install and configure AWS CLI and set up your profile. For more information, see Installing and Configuring AWS CLI.

To upload a large file to your bucket and automatically trigger the multipart upload process, use the cp command following this syntax:

Request sample
$ aws s3 cp LOCAL_PATH/LARGE_OBJECT_TO_UPLOAD s3://BUCKET/ \
   --profile YOUR_PROFILE \
   --debug \
   --acl public-read \
   --grant-full-control "id=USER_ID, id=USER_ID" \
   --grant-read "id=USER_ID, id=USER_ID" \
   --grant-read-acp "id=USER_ID, id=USER_ID" \
   --grant-write "id=USER_ID, id=USER_ID" \
   --cache-control "no-cache" \
   --content-disposition "inline" \
   --content-encoding "gzip" \
   --content-language "en" \
   --content-type text/plain \
   --expires "2024-03-01T00:00:00Z" \
   --metadata "{\"md5\": \"98ecc26f79229e2388dba73f8f521a95\"}" \
   --metadata-directive "REPLACE" \
   --expected-size 123456789 \
   --website-redirect "/new-page.html" \
   --endpoint https://oos.eu-west-2.outscale.com

Some of the attributes used in this request sample are optional but are highly recommended if you want to keep track of the multipart upload process. Such attributes are debug, metadata and expected-size.

This command contains the following attributes that you need to specify:

  • s3://BUCKET/LARGE_OBJECT_TO_UPLOAD: The path to the large object you want to upload in the bucket. S3 automatically performs a multipart upload for large objects by splitting them into smaller parts for efficient and reliable upload and then reassembling these parts at the designated location in the bucket.

  • (optional) profile: The named profile you want to use, created when configuring AWS CLI. For more information, see Installing and Configuring AWS CLI.

  • (optional) debug: When included, returns the detailed log of the operation that are useful to analyze and troubleshoot issues. In the context of the multipart upload process, it allows you to see how exactly your object was partitioned and then reassembled, as well as the ETags that were assigned to each part.

    • When specifying new permissions, all the previous permissions are replaced. Therefore, you need to specify both the existing permissions that you want to keep (including for yourself) and the new permissions that you want to give in a single command.

    • If you are the owner of the bucket, you can lose your own permissions but not the ability to manage the ACL itself.

    For more information about existing permissions, see Getting Information About a Bucket ACL and Getting Information About an Object ACL.

  • (optional) acl: The permissions for your object (private | public-read | public-read-write | authenticated-read | bucket-owner-read | bucket-owner-full-control).

  • (optional) grant-full-control: One or more IDs of users to whom you grant the full-control permission.

  • (optional) grant-read: One or more IDs of users to whom you grant the read permission.

  • (optional) grant-read-acp: One or more IDs of users to whom you grant the read-acp permission.

  • (optional) grant-write: One or more IDs of users to whom you grant the write permission.

  • (optional) grant-write-acp: One or more IDs of users to whom you grant the write-acp permission.

  • (optional) cache-control: How you want the object to be handled in terms of caching (max-age | max-stale | min-fresh | no-cache | no-store | no-transform | only-if-cached | stale-if-error).

  • (optional) content-disposition: How you want the content to be displayed (inline | attachment | attachment; filename="<NAME_OF_FILE>").

  • (optional) content-encoding: The encoding format of the object (gzip | compress | deflate | identity | br).

  • (optional) content-language: The language the content of the object is in, in language code (ISO 639 format).

  • (optional) content-type: The MIME (Multipurpose Internet Mail Extensions) type of the object.

    An inaccurately set or absent content-type attribute can cause objects to be misinterpreted or mishandled by browsers. As a result, you may encounter difficulties accessing or viewing your objects using your preferred browser.

  • (optional) expected-size: Specifies the expected size of the upload, in bytes. This is useful for objects larger than 50GB since it optimizes part allocation and resource usage during the multipart upload process.

  • (optional) expires: The date and time, in UTC format, at which you consider the object can no longer be cached, and is considered stale.

  • (optional) metadata: A map of additional metadata entries you can freely specify.

    • key: The name of the metadata.

    • value: The value of the metadata.

As per the provided example, you can store the MD5 checksum value of your object using the metadata attribute.

  • (optional) metadata-directive: Whether you want the metadata to be copied from the source object or replaced with the metadata provided in the request (COPY or REPLACE).

  • (optional) website-redirect: If you configured the destination bucket as a website, redirects requests for this object to another object in the same bucket or to an external URL.

  • endpoint: The endpoint corresponding to the Region you want to send the request to.

Your large object is uploaded to your bucket with the multipart upload process being automatically enabled in the background. To retrieve information about the object, see Getting Information About the Metadata of an Object.

Related Pages

AWS™ and Amazon Web Services™ are trademarks of Amazon Technologies, Inc or its affiliates in the United States and/or other countries.