Aws Sqs Extended Client S3 Multipart Upload
In this commodity nosotros come across how to store and retrieve files on AWS S3 using Elixir and with the assistance of ExAws. (If you want to use ExAws with DigitalOcean Spaces instead, you tin can read ExAws with DigitalOcean Spaces)
We get-go by setting upwardly an AWS account and credentials, configure an Elixir application and see the basic upload and download operations with pocket-size files.
And then, we see how to deal with large files, making multipart uploads and using presigned urls to create a download stream, processing information on the fly.
Create an IAM user, configure permissions and credentials
If you don't have an Amazon Web Services business relationship still, you tin can create it on https://aws.amazon.com/ and utilize the complimentary tier for the first 12 months where y'all take up to 5GB of gratuitous S3 storage.
Be sure you bank check all the limits of the free tier earlier yous start using the service. Always take a wait at the billing page to go along runway of the usage.
To admission to AWS S3 resources, showtime we need to create an AWS IAM (Identity and Access Direction) user with express permissions.
Once logged into the AWS console, go to the users department of the security credentials page and click on Add user.
When creating a user, nosotros need to prepare a username and virtually chiefly enable the Programmatic admission: this means the user can programmatically access to the AWS resource via API.
And so we prepare the permissions, attaching the AmazonS3FullAccess
policy and limiting the user to simply the S3 service.
Now, this policy is fine for this demo, simply it's however too broad: a user, or an app, can access to all the buckets, files and settings of S3.
By creating a custom policy, nosotros can limit the user permissions to only the needed S3 actions and buckets. More on this at AWS User Policy Examples
Once the user is created, we can download the Access Key Id and the Cloak-and-dagger Access Key. You must continue these keys secret considering whoever has them can admission to your AWS S3 resource.
To create an S3 bucket using the AWS console, go to the S3 department and click on Create bucket, set a bucket name (I've used poeticoding-aws-elixir)
and be certain to cake all the public access.
Configure ex_aws
and environs variables
Allow'southward create a new Elixir application and add the dependencies to make ex_aws and ex_aws_s3 work
# mix.exs def deps do [ {:ex_aws, "~> 2.1"}, {:ex_aws_s3, "~> 2.0"}, {:hackney, "~> 1.15"}, {:sweet_xml, "~> 0.6"}, {:jason, "~> 1.1"}, ] end
ExAws, by default, uses hackney HTTP Customer to make requests to AWS.
Nosotros create the config/config.exs
configuration file, where nosotros ready access id and secret access keys
# config/config.exs import Config config :ex_aws, json_codec: Jason, access_key_id: {:organisation, "AWS_ACCESS_KEY_ID"}, secret_access_key: {:system, "AWS_SECRET_ACCESS_KEY"}
The default ExAws JSON codec is Poison. If nosotros desire to employ another library, like Jason, we need to explicitly fix the jason_codec
property.
Nosotros don't want to write our keys in the configuration file. First, because who has access to the lawmaking can run into them, 2d because we want to make them easy to change.
We can use environs variables: by passing {:arrangement, "AWS_ACCESS_KEY_ID"}
and {:system, "AWS_SECRET_ACCESS_KEY"}
tuples the application gets the keys from the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
surroundings variables.
In case you lot are on a Unix/Unix-like arrangement (similar MacOs or Linux), you lot can fix these surroundings variables in a script
# .env file consign AWS_ACCESS_KEY_ID="your access key" export AWS_SECRET_ACCESS_KEY="your clandestine admission key"
and load them with source
$ source .env $ iex -South mix
Keep this script underground. If you are using git think to put this script into .gitignore to avert to commit this file.
If you don't want to keep these keys in a script, you can ever laissez passer them when launching the awarding or iex
$ AWS_ACCESS_KEY_ID="..." \ AWS_SECRET_ACCESS_KEY="..." \ iex -S mix
In case you lot're on a Windows motorcar, you can set the environment variables using the command prompt or the PowerShell
# Windows CMD set AWS_ACCESS_KEY_ID="..." # Windows PowerShell $env:AWS_ACCESS_KEY_ID="..."
List the buckets
Now we have everything prepare: credentials, application dependencies and ex_aws
configured with environment variables. And then permit's try the first request.
# load the surround variables $ source .env # run iex $ iex -South mix iex> ExAws.S3.list_buckets() %ExAws.Functioning.S3{ http_method: :get, parser: &ExAws.S3.Parsers.parse_all_my_buckets_result/1, path: "/", service: :s3, ..., }
The ExAws.S3.list_buckets()
part doesn't transport the request itself, it returns an ExAws.Operation.S3
struct. To make a request we employ ExAws.request
or ExAws.asking!
iex> ExAws.S3.list_buckets() |> ExAws.request!() %{ body: %{ buckets: [ %{ creation_date: "2019-11-25T17:48:sixteen.000Z", name: "poeticoding-aws-elixir" } ], owner: %{ ... } }, headers: [ ... {"Content-Type", "awarding/xml"}, {"Transfer-Encoding", "chunked"}, {"Server", "AmazonS3"}, ... ], status_code: 200 }
ExAws.request!
returns a map with the HTTP response from S3. With get_in/two we can get just the bucket list
ExAws.S3.list_buckets() |> ExAws.request!() |> get_in([:body, :buckets]) [%{creation_date: "2019-11-25T17:48:16.000Z", proper name: "poeticoding-aws-elixir"}]
put, listing, get and delete
With ExAws
, the easiest manner to upload a file to S3 is with ExAws.S3.put_object/iv
iex> local_image = File.read!("elixir_logo.png") <<137, eighty, 78, 71, 13, 10, 26, 10, 0, 0, ...>> iex> ExAws.S3.put_object("poeticoding-aws-elixir", "images/elixir_logo.png", local_image) \ ...> |> ExAws.request!() %{ trunk: "", headers: [...], status_code: 200 }
The first argument is the bucket name, then we pass the object key (the path) and the third is the file's content, local_image
. As a fourth argument we can laissez passer a listing of options like storage class, meta, encryption etc.
Using the AWS management panel, on the S3 saucepan'south page, we can encounter the file we've just uploaded.
We list the bucket'south objects with ExAws.S3.list_objects
iex> ExAws.S3.list_objects("poeticoding-aws-elixir") \ ...> |> ExAws.asking!() \ ...> |> get_in([:body, :contents]) \ [ %{ e_tag: "\"...\"", key: "images/elixir_logo.png", last_modified: "2019-11-26T14:xl:34.000Z", possessor: %{ ... } size: "29169", storage_class: "STANDARD" } ]
Passing the bucket name and object key to ExAws.S3.get_object/2, we get the file's content.
iex> resp = ExAws.S3.get_object("poeticoding-aws-elixir", "images/elixir_logo.png") \ ...> |> ExAws.request!() %{ body: <<137, lxxx, 78, 71, 13, 10, 26, ...>>, headers: [ {"Last-Modified", "Tue, 26 November 2019 14:twoscore:34 GMT"}, {"Content-Type", "application/octet-stream"}, {"Content-Length", "29169"}, ... ], status_code: 200 }
The request returns a response map with the whole file's content in :torso
.
iex> File.read!("elixir_logo.png") == resp.body true
We can delete the object with ExAws.S3.delete_object/two.
iex> ExAws.S3.delete_object("poeticoding-aws-elixir", "images/elixir_logo.png") \ ...> |> ExAws.asking!() %{ body: "", headers: [ {"Date", "Tue, 26 November 2019 15:04:35 GMT"}, ... ], status_code: 204 }
After listing again the objects we encounter, as expected, that the bucket is now empty.
iex> ExAws.S3.list_objects("poeticoding-aws-elixir") ...> |> ExAws.request!() ...> |> get_in([:torso, :contents]) []
Multipart upload and large files
The epitome in the instance above is simply ~30 KB and nosotros can only apply put_object
and get_object
to upload and download it, but there are some limits:
- with these 2 functions the file is fully kept in retentiveness, for both upload and download.
- put_object uploads the file in a unmarried operation and we can upload only objects up to 5 GB in size.
S3 and ExAws client support multipart uploads. It means that a file is divided in parts (5 MB parts by default) which are sent separately and in parallel to S3! In case the office's upload fails, ExAws retries the upload of that 5 MB part only.
With multipart uploads nosotros can upload objects from 5 MB to v TB – ExAws uses file streams, fugitive to keep the whole file in memory.
Permit's consider numbers.txt
, a relatively large txt file we've already seen in another commodity – Elixir Stream and large HTTP responses: processing text ( y'all can download from this url https://www.poeticoding.com/downloads/httpstream/numbers.txt
).
numbers.txt
size is 125 MB, much smaller than the 5GB limit imposed past the single PUT operation. But to me this file is large plenty to benefit from a multipart upload!
iex> ExAws.S3.Upload.stream_file("numbers.txt") \ ...> |> ExAws.S3.upload("poeticoding-aws-elixir", "numbers.txt") \ ...> |> ExAws.asking!() # returned response %{ body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\n<CompleteMultipartUploadResult>...", headers: [ {"Date", "Tue, 26 Nov 2019 xvi:34:08 GMT"}, {"Content-Type", "application/xml"}, {"Transfer-Encoding", "chunked"}, ], status_code: 200 }
- Starting time we create a file stream with ExAws.S3.Upload.stream_file/ii
- The stream is passed to ExAws.S3.upload/4, along with bucket name and object key
-
ExAws.request!
initialize the multipart upload and uploads the parts
To have an idea of what ExAws is doing, we can enable the debug option in the ex_aws
configuration
# config/config.exs config :ex_aws, debug_requests: truthful, json_codec: Jason, access_key_id: {:system, "AWS_ACCESS_KEY_ID"}, secret_access_key: {:arrangement, "AWS_SECRET_ACCESS_KEY"}
We should see multiple parts being sent at the same time
17:eleven:24.586 [debug] ExAws: Request URL: "...?partNumber=2&uploadId=..." Endeavor: 1 17:11:24.589 [debug] ExAws: Asking URL: "...?partNumber=1&uploadId=..." Try: one
Multipart upload timeout
When the file is big, the upload could take time. To upload the parts concurrently, ExAws uses Elixir Tasks – the default timeout for part's upload is ready to 30 seconds, which could non be plenty with a ho-hum connexion.
** (go out) exited in: Task.Supervised.stream(30000) ** (EXIT) fourth dimension out
Nosotros can change the timeout by passing a new :timeout
to ExAws.S3.upload/4
, 120 seconds in this example.
ExAws.S3.Upload.stream_file("numbers.txt") |> ExAws.S3.upload( "poeticoding-aws-elixir", "numbers.txt", [timeout: 120_000]) |> ExAws.request!()
Download a large file
To download a big file it's better to avoid get_object
, which holds the whole file content in retentiveness. With ExAws.S3.download_file/4 instead, we can download the data in chunks saving them direct into a file.
ExAws.S3.download_file( "poeticoding-aws-elixir", "numbers.txt", "local_file.txt" ) |> ExAws.asking!()
presigned urls and download streams – process a file on the fly
Unfortunately we can't use ExAws.S3.download_file/4
to get a download stream and process the file on the fly.
However, nosotros tin generate a presigned url to get a unique and temporary URL and then download the file with a library like mint or HTTPoison.
iex> ExAws.Config.new(:s3) \ ...> |> ExAws.S3.presigned_url(:get, "poeticoding-aws-elixir", "numbers.txt") {:ok, "https://...?10-Amz-Credential=...&X-Amz-Expires=3600"}
Past default, the URL expires after ane hour – with the :expires_in
pick we can set a unlike expiration time (in seconds).
iex> ExAws.Config.new(:s3) \ ...> |> ExAws.S3.presigned_url(:go, "poeticoding-aws-elixir", "numbers.txt", [expires_in: 300]) # 300 seconds {:ok, "https://...?X-Amz-Credential=...&Ten-Amz-Expires=300"}
Now that we have the URL, we can use Elixir Streams to process the information on the fly and calculate the sum of all the lines numbers.txt
. In this article you lot find the HTTPStream's code and how it works.
# generate the presigned URL ExAws.Config.new(:s3) |> ExAws.S3.presigned_url(:get, "poeticoding-aws-elixir", "numbers.txt") # returning just the URL cord to the next step |> case do {:ok, url} -> url end # using HTTPStream to download the file in chunks # getting a Stream of lines |> HTTPStream.become() |> HTTPStream.lines() ## converting each line to an integer |> Stream.map(fn line-> case Integer.parse(line) exercise {num, _} -> num :error -> 0 end end) ## sum the numbers |> Enum.sum() |> IO.inspect(label: "result")
In the first two lines we generate a presigned url. And then, with HTTPStream.get we create a stream that lazily downloads the file chunk past chunk, transforming chunks into lines with HTTPStream.lines, mapping the lines into integers and summing all the numbers. The result should exist 12468816.
Source: https://www.poeticoding.com/aws-s3-in-elixir-with-exaws/
0 Response to "Aws Sqs Extended Client S3 Multipart Upload"
Post a Comment