# Amazon S3

![](https://3973873749-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FtS4Hn3nHPTmtsuvGS3cV%2Fuploads%2Fgit-blob-cb97215e83d07d55b4ad66dcfdd9b0e83eb3a9d8%2Fimage%20\(9\).png?alt=media)

The Amazon S3 output plugin allows you to ingest your records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store.

The plugin can upload data to S3 using the [multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) or using S3 [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html). Multipart is the default and is recommended; Fluent Bit will stream data in a series of 'parts'. This limits the amount of data it has to buffer on disk at any point in time. By default, every time 5 MiB of data have been received, a new 'part' will be uploaded. The plugin can create files up to gigabytes in size from many small chunks/parts using the multipart API. All aspects of the upload process are configurable using the configuration options.

The plugin allows you to specify a maximum file size, and a timeout for uploads. A file will be created in S3 when the max size is reached, or the timeout is reached- whichever comes first.

Records are stored in files in S3 as newline delimited JSON.

See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b0edb2f9acd7cdfdbc3/administration/aws-credentials.md) for details on how AWS credentials are fetched.

**NOTE**: *The* [*Prometheus success/retry/error metrics values*](https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/administration/monitoring.md) *outputted by Fluent Bit's built-in http server are meaningless for the S3 output*. This is because S3 has its own buffering and retry mechanisms. The Fluent Bit AWS S3 maintainers apologize for this feature gap; you can [track our progress fixing it on GitHub](https://github.com/fluent/fluent-bit/issues/6141).

## Configuration Parameters

| Key                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Default                                 |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
| region                           | The AWS region of your S3 bucket                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | us-east-1                               |
| bucket                           | S3 Bucket name                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | None                                    |
| json\_date\_key                  | Specify the name of the time key in the output record. To disable the time key just set the value to `false`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | date                                    |
| json\_date\_format               | Specify the format of the date. Supported formats are *double*, *epoch*, *iso8601* (eg: *2018-05-30T09:39:52.000681Z*) and *java\_sql\_timestamp* (eg: *2018-05-30 09:39:52.000681*)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | iso8601                                 |
| total\_file\_size                | Specifies the size of files in S3. Minimum size is 1M. With `use_put_object On` the maximum size is 1G. With multipart upload mode, the maximum size is 50G.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 100M                                    |
| upload\_chunk\_size              | The size of each 'part' for multipart uploads. Max: 50M                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 5,242,880 bytes                         |
| upload\_timeout                  | Whenever this amount of time has elapsed, Fluent Bit will complete an upload and create a new file in S3. For example, set this value to 60m and you will get a new file every hour.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 10m                                     |
| store\_dir                       | Directory to locally buffer data before sending. When multipart uploads are used, data will only be buffered until the `upload_chunk_size` is reached. S3 will also store metadata about in progress multipart uploads in this directory; this allows pending uploads to be completed even if Fluent Bit stops and restarts. It will also store the current $INDEX value if enabled in the S3 key format so that the $INDEX can keep incrementing from its previous value after Fluent Bit restarts.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | /tmp/fluent-bit/s3                      |
| store\_dir\_limit\_size          | The size of the limitation for disk usage in S3. Limit the amount of s3 buffers in the `store_dir` to limit disk usage. Note: Use `store_dir_limit_size` instead of `storage.total_limit_size` which can be used to other plugins, because S3 has its own buffering system.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 0, which means unlimited                |
| s3\_key\_format                  | Format string for keys in S3. This option supports a UUID, strftime time formatters, a syntax for selecting parts of the Fluent log tag using a syntax inspired by the rewrite\_tag filter. Add $UUID in the format string to insert a random string. Add $INDEX in the format string to insert an integer that increments each upload. The $INDEX value will be saved in the store\_dir so that if Fluent Bit restarts the value will keep incrementing from the previous run. Add $TAG in the format string to insert the full log tag; add $TAG\[0] to insert the first part of the tag in the s3 key. The tag is split into “parts” using the characters specified with the `s3_key_format_tag_delimiters` option. Add extension directly after the last piece of the format string to insert a key suffix. If you want to specify a key suffix and you are in `use_put_object` mode, you must specify $UUID as well. More explanations can be found in the S3 Key Format explainer section further down in this document. See the in depth examples and tutorial in the documentation. Time in s3\_key is the timestamp of the first record in the S3 file. | /fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S |
| s3\_key\_format\_tag\_delimiters | A series of characters which will be used to split the tag into 'parts' for use with the s3\_key\_format option. See the in depth examples and tutorial in the documentation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | .                                       |
| static\_file\_path               | Disables behavior where UUID string is automatically appended to end of S3 key name when $UUID is not provided in s3\_key\_format. $UUID, time formatters, $TAG, and other dynamic key formatters all work as expected while this feature is set to true.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | false                                   |
| use\_put\_object                 | Use the S3 PutObject API, instead of the multipart upload API. When this option is on, key extension is only available when $UUID is specified in `s3_key_format`. If $UUID is not included, a random string will be appended at the end of the format string and the key extension cannot be customized in this case.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | false                                   |
| role\_arn                        | ARN of an IAM role to assume (ex. for cross account access).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | None                                    |
| endpoint                         | Custom endpoint for the S3 API. An endpoint can contain scheme and port.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | None                                    |
| sts\_endpoint                    | Custom endpoint for the STS API.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | None                                    |
| profile                          | Option to specify an AWS Profile for credentials.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | default                                 |
| canned\_acl                      | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | None                                    |
| compression                      | Compression type for S3 objects. 'gzip' is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can also use 'arrow'. For gzip compression, the Content-Encoding HTTP Header will be set to 'gzip'. Gzip compression can be enabled when `use_put_object` is 'on' or 'off' (PutObject and Multipart). Arrow compression can only be enabled with `use_put_object On`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | None                                    |
| content\_type                    | A standard MIME type for the S3 object; this will be set as the Content-Type HTTP header.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | None                                    |
| send\_content\_md5               | Send the Content-MD5 header with PutObject and UploadPart requests, as is required when Object Lock is enabled.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | false                                   |
| auto\_retry\_requests            | Immediately retry failed requests to AWS services once. This option does not affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which may help improve throughput when there are transient/random networking issues.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | true                                    |
| log\_key                         | By default, the whole log record will be sent to S3. If you specify a key name with this option, then only the value of that key will be sent to S3. For example, if you are using Docker, you can specify log\_key log and only the log message will be sent to S3.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | None                                    |
| preserve\_data\_ordering         | Normally, when an upload request fails, there is a high chance for the last received chunk to be swapped with a later chunk, resulting in data shuffling. This feature prevents this shuffling by using a queue logic for uploads.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | true                                    |
| storage\_class                   | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option is not specified, objects will be stored with the default 'STANDARD' storage class.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | None                                    |
| retry\_limit                     | Integer value to set the maximum number of retries allowed. Note: this configuration is released since version 1.9.10 and 2.0.1. For previous version, the number of retries is 5 and is not configurable.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 1                                       |
| external\_id                     | Specify an external ID for the STS API, can be used with the role\_arn parameter if your role requires an external ID.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | None                                    |

## TLS / SSL

To skip TLS verification, set `tls.verify` as `false`. For more details about the properties available and general configuration, please refer to the [TLS/SSL](https://docs.fluentbit.io/manual/3.0/administration/transport-security) section.

## Permissions

The plugin requires the following AWS IAM permissions:

```
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Action": [
			"s3:PutObject"
		],
		"Resource": "*"
	}]
}
```

## Differences between S3 and other Fluent Bit outputs

The s3 output plugin is special because its use case is to upload files of non-trivial size to an Amazon S3 bucket. This is in contrast to most other outputs which send many requests to upload data in batches of a few Megabytes or less.

When Fluent Bit recieves logs, it stores them in chunks, either in memory or the filesystem depending on your settings. A chunk is usually around 2 MB in size. Fluent Bit sends the chunks in order to each output that matches their tag. Most outputs then send the chunk immediately to their destination. A chunk is sent to the output's "flush callback function", which must return one of `FLB_OK`, `FLB_RETRY`, or `FLB_ERROR`. Fluent Bit keeps count of the return values from each outputs "flush callback function"; these counters are the data source for Fluent Bit's error, retry, and success metrics available in prometheus format via its monitoring interface.

The S3 output plugin is a Fluent Bit output plugin and thus it conforms to the Fluent Bit output plugin specification. However, since the S3 use case is to upload large files, generally much larger than 2 MB, its behavior is different. The S3 "flush callback function" simply buffers the incoming chunk to the filesystem, and returns an `FLB_OK`. *Consequently, the prometheus metrics available via the Fluent Bit http server are meaningless for S3.* In addition, the `storage.total_limit_size` parameter is not meaningful for S3 since it has its own buffering system in the `store_dir`. Instead, use `store_dir_limit_size`. Finally, *S3 always requires a write-able filesystem*; running Fluent Bit on a read-only filesystem will not work with the S3 output.

S3 uploads are primarily initiated via the S3 "timer callback function", which runs separately from its "flush callback function". Because S3 has its own system of buffering and its own callback to upload data, the normal sequential data ordering of chunks provided by the Fluent Bit engine may be compromised. Consequently, S3 has the `presevere_data_ordering` option which will ensure data is uploaded in the original order it was collected by Fluent Bit.

### Summary: Uniqueness in S3 Plugin

1. *The HTTP Monitoring interface output metrics are not meaningful for S3*: AWS understands that this is non-ideal; we have [opened an issue with a design](https://github.com/fluent/fluent-bit/issues/6141) that will allow S3 to manage its own output metrics.
2. *You must use `store_dir_limit_size` to limit the space on disk used by S3 buffer files*.
3. *The original ordering of data inputted to Fluent Bit may not be preserved unless you enable `preserve_data_ordering On`*.

## S3 Key Format and Tag Delimiters

In Fluent Bit, all logs have an associated tag. The `s3_key_format` option lets you inject the tag into the s3 key using the following syntax:

* `$TAG` => the full tag
* `$TAG[n]` => the nth part of the tag (index starting at zero). This syntax is copied from the rewrite tag filter. By default, “parts” of the tag are separated with dots, but you can change this with `s3_key_format_tag_delimiters`.

In the example below, assume the date is January 1st, 2020 00:00:00 and the tag associated with the logs in question is `my_app_name-logs.prod`.

```
[OUTPUT]
    Name                         s3
    Match                        *
    bucket                       my-bucket
    region                       us-west-2
    total_file_size              250M
    s3_key_format                /$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.gz
    s3_key_format_tag_delimiters .-
```

With the delimiters as . and -, the tag will be split into parts as follows:

* `$TAG[0]` = my\_app\_name
* `$TAG[1]` = logs
* `$TAG[2]` = prod

So the key in S3 will be `/prod/my_app_name/2020/01/01/00/00/00/bgdHN1NM.gz`.

### Allowing a file extension in the S3 Key Format with $UUID

The Fluent Bit S3 output was designed to ensure that previous uploads will never be over-written by a subsequent upload. Consequently, the `s3_key_format` supports time formatters, `$UUID`, and `$INDEX`. `$INDEX` is special because it is saved in the `store_dir`; if you restart Fluent Bit with the same disk, then it can continue incrementing the index from its last value in the previous run.

For files uploaded with the PutObject API, the S3 output requires that a unique random string be present in the S3 key. This is because many of the use cases for PutObject uploads involve a short time period between uploads such that a timestamp in the S3 key may not be unique enough between uploads. For example, if you only specify minute granularity timestamps in the S3 key, with a small upload size, it is possible to have two uploads that have timestamps set in the same minute. This "requirement" can be disabled with `static_file_path On`.

There are three cases where the PutObject API is used:

1. When you explicitly set `use_put_object On`
2. On startup when the S3 output finds old buffer files in the `store_dir` from a previous run and attempts to send all of them at once.
3. On shutdown, when to prevent data loss the S3 output attempts to send all currently buffered data at once.

Consequently, you should always specify `$UUID` somewhere in your S3 key format. Otherwise, if the PutObject API is used, S3 will append a random 8 character UUID to the end of your S3 key. This means that a file extension set at the end of an S3 key will have the random UUID appended to it. This behavior can be disabled with `static_file_path On`.

Let's walk through this via an example. First case, we attempt to set a `.gz` extension without specifying `$UUID`.

```
[OUTPUT]
    Name                         s3
    Match                        *
    bucket                       my-bucket
    region                       us-west-2
    total_file_size              50M
    use_put_object               Off
    compression                  gzip
    s3_key_format                /$TAG/%Y/%m/%d/%H_%M_%S.gz
```

In the case where pending data is uploaded on shutdown, if the tag was `app`, the S3 key in the S3 bucket might be:

```
/app/2022/12/25/00_00_00.gz-apwgylqg
```

The S3 output appended a random string to the "extension", since this upload on shutdown used the PutObject API.

There are two ways of disabling this behavior. Option 1, use `static_file_path`:

```
[OUTPUT]
    Name                         s3
    Match                        *
    bucket                       my-bucket
    region                       us-west-2
    total_file_size              50M
    use_put_object               Off
    compression                  gzip
    s3_key_format                /$TAG/%Y/%m/%d/%H_%M_%S.gz
    static_file_path             On
```

Option 2, explicitly define where the random UUID will go in the S3 key format:

```
[OUTPUT]
    Name                         s3
    Match                        *
    bucket                       my-bucket
    region                       us-west-2
    total_file_size              50M
    use_put_object               Off
    compression                  gzip
    s3_key_format                /$TAG/%Y/%m/%d/%H_%M_%S/$UUID.gz
```

## Reliability

The `store_dir` is used to temporarily store data before it is uploaded. If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down. If it can not send some data, on restart it will look in the `store_dir` for existing data and will try to send it.

Multipart uploads are ideal for most use cases because they allow the plugin to upload data in small chunks over time. For example, 1 GB file can be created from 200 5MB chunks. While the file size in S3 will be 1 GB, only 5 MB will be buffered on disk at any one point in time.

There is one minor drawback to multipart uploads- the file and data will not be visible in S3 until the upload is completed with a [CompleteMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html) call. The plugin will attempt to make this call whenever Fluent Bit is shut down to ensure your data is available in s3. It will also store metadata about each upload in the `store_dir`, ensuring that uploads can be completed when Fluent Bit restarts (assuming it has access to persistent disk and the `store_dir` files will still be present on restart).

### Using S3 without persisted disk

If you run Fluent Bit in an environment without persistent disk, or without the ability to restart Fluent Bit and give it access to the data stored in the `store_dir` from previous executions- some considerations apply. This might occur if you run Fluent Bit on [AWS Fargate](https://aws.amazon.com/fargate/).

In these situations, we recommend using the PutObject API, and sending data frequently, to avoid local buffering as much as possible. This will limit data loss in the event Fluent Bit is killed unexpectedly.

The following settings are recommended for this use case:

```
[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     total_file_size 1M
     upload_timeout 1m
     use_put_object On
```

## S3 Multipart Uploads

With `use_put_object Off` (default), S3 will attempt to send files using multipart uploads. For each file, S3 first calls [CreateMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html), then a series of calls to [UploadPart](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html) for each fragment (targeted to be `upload_chunk_size` bytes), and finally [CompleteMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html) to create the final file in S3.

### Fallback to PutObject

S3 [requires](https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html) each [UploadPart](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html) fragment to be at least 5,242,880 bytes, otherwise the upload is rejected.

Consequently, the S3 output must sometimes fallback to the [PutObject API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html).

Uploads are triggered by three settings:

1. `total_file_size` and `upload_chunk_size`: When S3 has buffered data in the `store_dir` that meets the desired `total_file_size` (for `use_put_object On`) or the `upload_chunk_size` (for Multipart), it will trigger an upload operation.
2. `upload_timeout`: Whenever locally buffered data has been present on the filesystem in the `store_dir` longer than the configured `upload_timeout`, it will be sent. This happens regardless of whether or not the desired byte size has been reached. Consequently, if you configure a small `upload_timeout`, your files may be smaller than the `total_file_size`. The timeout is evaluated against the time at which S3 started buffering data for each unqiue tag (that is, the time when new data was buffered for the unique tag after the last upload). The timeout is also evaluated against the [CreateMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html) time, so a multipart upload will be completed after `upload_timeout` has elapsed, even if the desired size has not yet been reached.

If your `upload_timeout` triggers an upload before the pending buffered data reaches the `upload_chunk_size`, it may be too small for a multipart upload. S3 will consequently fallback to use the [PutObject API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html).

When you enable compression, S3 applies the compression algorithm at send time. The size settings noted above trigger uploads based on the size of buffered data, not the final compressed size. Consequently, it is possible that after compression, buffered data no longer meets the required minimum S3 [UploadPart](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html) size. If this occurs, you will see a log message like:

```
[ info] [output:s3:s3.0] Pre-compression upload_chunk_size= 5630650, After compression, chunk is only 1063320 bytes, the chunk was too small, using PutObject to upload
```

If you encounter this frequently, use the numbers in the messages to guess your compression factor. For example, in this case, the buffered data was reduced from 5,630,650 bytes to 1,063,320 bytes. The compressed size is 1/5 the actual data size, so configuring `upload_chunk_size 30M` should ensure each part is large enough after compression to be over the min required part size of 5,242,880 bytes.

The S3 API allows the last part in an upload to be less than the 5,242,880 byte minimum. Therefore, if a part is too small for an existing upload, the S3 output will upload that part and then complete the upload.

### upload\_timeout constrains total multipart upload time for a single file

The `upload_timeout` is evaluated against the [CreateMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html) time. So a multipart upload will be completed after `upload_timeout` has elapsed, even if the desired size has not yet been reached.

### Completing uploads

When [CreateMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html) is called, an `UploadID` is returned. S3 stores these IDs for active uploads in the `store_dir`. Until [CompleteMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html) is called, the uploaded data will not be visible in S3.

On shutdown, S3 output will attempt to complete all pending uploads. If it fails to complete an upload, the ID will remain buffered in the `store_dir` in a directory called `multipart_upload_metadata`. If you restart the S3 output with the same `store_dir` it will discover the old UploadIDs and complete the pending uploads. The [S3 documentation](https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/) also has suggestions on discovering and deleting/completing dangling uploads in your buckets.

## Worker support

Fluent Bit 1.7 adds a new feature called `workers` which enables outputs to have dedicated threads. This `s3` plugin has partial support for workers. **The plugin can only support a single worker; enabling multiple workers will lead to errors/indeterminate behavior.**

Example:

```
[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     total_file_size 1M
     upload_timeout 1m
     use_put_object On
     workers 1
```

If you enable a single worker, you are enabling a dedicated thread for your S3 output. We recommend starting without workers, evaluating the performance, and then enabling a worker if needed. For most users, the plugin can provide sufficient throughput without workers.

## Usage with MinIO

[MinIO](https://min.io/) is a high-performance, S3 compatible object storage and you can build your app with S3 functionality without S3.

Assume you run [a MinIO server](https://docs.min.io/docs/minio-quickstart-guide.html) at localhost:9000, and create a bucket of `your-bucket` by referring [the client docs](https://docs.min.io/docs/minio-client-quickstart-guide).

Example:

```
[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     endpoint http://localhost:9000
```

Then, the records will be stored into the MinIO server.

## Getting Started

In order to send records into Amazon S3, you can run the plugin from the command line or through the configuration file.

### Command Line

The **s3** plugin, can read the parameters from the command line through the **-p** argument (property), e.g:

```
$ fluent-bit -i cpu -o s3 -p bucket=my-bucket -p region=us-west-2 -p -m '*' -f 1
```

### Configuration File

In your main configuration file append the following *Output* section:

```
[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     store_dir /home/ec2-user/buffer
     total_file_size 50M
     upload_timeout 10m
```

An example that using PutObject instead of multipart:

```
[OUTPUT]
     Name s3
     Match *
     bucket your-bucket
     region us-east-1
     store_dir /home/ec2-user/buffer
     use_put_object On
     total_file_size 10M
     upload_timeout 10m
```

## AWS for Fluent Bit

Amazon distributes a container image with Fluent Bit and this plugins.

### GitHub

[github.com/aws/aws-for-fluent-bit](https://github.com/aws/aws-for-fluent-bit)

### Amazon ECR Public Gallery

[aws-for-fluent-bit](https://gallery.ecr.aws/aws-observability/aws-for-fluent-bit)

Our images are available in Amazon ECR Public Gallery. You can download images with different tags by following command:

```
docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:<tag>
```

For example, you can pull the image with latest version by:

```
docker pull public.ecr.aws/aws-observability/aws-for-fluent-bit:latest
```

If you see errors for image pull limits, try log into public ECR with your AWS credentials:

```
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
```

You can check the [Amazon ECR Public official doc](https://docs.aws.amazon.com/AmazonECR/latest/public/get-set-up-for-amazon-ecr.html) for more details.

### Docker Hub

[amazon/aws-for-fluent-bit](https://hub.docker.com/r/amazon/aws-for-fluent-bit/tags)

### Amazon ECR

You can use our SSM Public Parameters to find the Amazon ECR image URI in your region:

```
aws ssm get-parameters-by-path --path /aws/service/aws-for-fluent-bit/
```

For more see [the AWS for Fluent Bit github repo](https://github.com/aws/aws-for-fluent-bit#public-images).

## Advanced usage

### Use Apache Arrow for in-memory data processing

Starting from Fluent Bit v1.8, the Amazon S3 plugin includes the support for [Apache Arrow](https://arrow.apache.org/). The support is currently not enabled by default, as it depends on a shared version of `libarrow` as the prerequisite.

To use this feature, `FLB_ARROW` must be turned on at compile time:

```
$ cd build/
$ cmake -DFLB_ARROW=On ..
$ cmake --build .
```

Once compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow format. For example:

```
[INPUT]
  Name cpu

[OUTPUT]
  Name s3
  Bucket your-bucket-name
  total_file_size 1M
  use_put_object On
  upload_timeout 60s
  Compression arrow
```

As shown in this example, setting `Compression` to `arrow` makes Fluent Bit to convert payload into Apache Arrow format.

The stored data is very easy to load, analyze and process using popular data processing tools (such as Python pandas, Apache Spark and Tensorflow). The following code uses `pyarrow` to analyze the uploaded data:

```
>>> import pyarrow.feather as feather
>>> import pyarrow.fs as fs
>>>
>>> s3 = fs.S3FileSystem()
>>> file = s3.open_input_file("my-bucket/fluent-bit-logs/cpu.0/2021/04/27/09/36/15-object969o67ZF")
>>> df = feather.read_feather(file)
>>> print(df.head())
                          date  cpu_p  user_p  system_p  cpu0.p_cpu  cpu0.p_user  cpu0.p_system
0  2021-04-27T09:33:53.539346Z    1.0     1.0       0.0         1.0          1.0            0.0
1  2021-04-27T09:33:54.539330Z    0.0     0.0       0.0         0.0          0.0            0.0
2  2021-04-27T09:33:55.539305Z    1.0     0.0       1.0         1.0          0.0            1.0
3  2021-04-27T09:33:56.539430Z    0.0     0.0       0.0         0.0          0.0            0.0
4  2021-04-27T09:33:57.539803Z    0.0     0.0       0.0         0.0          0.0            0.0
```
