Networking

Fluent Bit implements a unified networking interface that is exposed to components like plugins. This interface abstract all the complexity of general I/O and is fully configurable.

A common use case is when a component or plugin needs to connect to a service to send and receive data. Despite the operational mode sounds easy to deal with, there are many factors that can make things hard like unresponsive services, networking latency or any kind of connectivity error. The networking interface aims to abstract and simplify the network I/O handling, minimize risks and optimize performance.

Concepts

TCP Connect Timeout

Most of the time creating a new TCP connection to a remote server is straightforward and takes a few milliseconds. But there are cases where DNS resolving, slow network or incomplete TLS handshakes might create long delays, or incomplete connection statuses.

The net.connect_timeout allows to configure the maximum time to wait for a connection to be established, note that this value already considers the TLS handshake process.

TCP Source Address

On environments with multiple network interfaces, might be desired to choose which interface to use for our data that will flow through the network.

The net.source_address allows to specify which network address must be used for a TCP connection and data flow.

Connection Keepalive

TCP is a connected oriented channel, to deliver and receive data from a remote end-point in most of cases we use a TCP connection. This TCP connection can be created and destroyed once is not longer needed, this approach has pros and cons, here we will refer to the opposite case: keep the connection open.

The concept of Connection Keepalive refers to the ability of the client (Fluent Bit on this case) to keep the TCP connection open in a persistent way, that means that once the connection is created and used, instead of close it, it can be recycled. This feature offers many benefits in terms of performance since communication channels are always established before hand.

Any component that uses TCP channels like HTTP or TLS, can take advantage of this feature. For configuration purposes use the net.keepalive property.

Connection Keepalive Idle Timeout

If a connection is keepalive enabled, there might be scenarios where the connection can be unused for long periods of time. Having an idle keepalive connection is not helpful and is recommendable to keep them alive if they are used.

In order to control how long a keepalive connection can be idle, we expose the configuration property called net.keepalive_idle_timeout.

TCP Keepalive

An open TCP connection to a remote server is subject to be silently dropped by intermediate equipment in the network (e.g., routers) if it's quiet for too long. What too long means depends on manufacturers and configurations outside of the control of fluentbit.

If you're using the Connection Keepalive feature, but not achieving the desired connectivity rates, you might want to try setting net.tcp_keepalive to on. This will configure the socket to periodically send keepalive probes if the connection is silent. These probes will be sent all the way to the server, making the equipment in between consider the connection as active. Is then expected that the server will acknowledge the probe, allowing fluentbit to detect a broken connection right away.

TCP Keepalive Time

If TCP keepalive is used, net.tcp_keepalive_time allows to override the OS default configuration with the desired period to wait between the last data packet is sent and TCP keepalive probing starts.

TCP Keepalive Interval

If TCP keepalive is used, net.tcp_keepalive_interval allows to override the OS default configuration with the desired period between probes if the first one fails to be acknowledged.

TCP Keepalive Probes

If TCP keepalive is used, net.tcp_keepalive_probes allows to override the OS default configuration with the desired number of unacknowledged probes before deeming a connection dead.

TCP Keepalive Recycling

If a TCP connection is keepalive enabled and has very high traffic, the connection may never be killed. In a situation where the remote endpoint is load-balanced in some way, this may lead to an unequal distribution of traffic. Setting net.keepalive_max_recycle causes keepalive connections to be recycled after a number of messages are sent over that connection. Once this limit is reached, the connection is terminated gracefully, and a new connection will be created for subsequent messages.

Configuration Options

For plugins that rely on networking I/O, the following section describes the network configuration properties available and how they can be used to optimize performance or adjust to different configuration needs:

Example

As an example, we will send 5 random messages through a TCP output connection, in the remote side we will use nc (netcat) utility to see the data.

Put the following configuration snippet in a file called fluent-bit.conf:

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      random
    samples   5

[OUTPUT]
    name      tcp
    match     *
    host      127.0.0.1
    port      9090
    format    json_lines
    # Networking Setup
    net.connect_timeout         5
    net.source_address          127.0.0.1
    net.keepalive               on
    net.keepalive_idle_timeout  10

In another terminal, start nc and make it listen for messages on TCP port 9090:

$ nc -l 9090

Now start Fluent Bit with the configuration file written above and you will see the data flowing to netcat:

$ nc -l 9090
{"date":1587769732.572266,"rand_value":9704012962543047466}
{"date":1587769733.572354,"rand_value":7609018546050096989}
{"date":1587769734.572388,"rand_value":17035865539257638950}
{"date":1587769735.572419,"rand_value":17086151440182975160}
{"date":1587769736.572277,"rand_value":527581343064950185}

If the net.keepalive option is not enabled, Fluent Bit will close the TCP connection and netcat will quit, here we can see how the keepalive connection works.

After the 5 records arrive, the connection will keep idle and after 10 seconds it will be closed due to net.keepalive_idle_timeout.

Last updated