# Grep

The *Grep Filter* plugin allows you to match or exclude specific records based on regular expression patterns for values or nested values.

## Configuration Parameters

The plugin supports the following configuration parameters:

| Key         | Value Format | Description                                                                                                                                                                                                                                                                                             |
| ----------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Regex       | KEY REGEX    | Keep records in which the content of KEY matches the regular expression.                                                                                                                                                                                                                                |
| Exclude     | KEY REGEX    | Exclude records in which the content of KEY matches the regular expression.                                                                                                                                                                                                                             |
| Logical\_Op | Operation    | Specify which logical operator to use. `AND` , `OR` and `legacy` are allowed as an Operation. Default is `legacy` for backward compatibility. In `legacy` mode the behaviour is either AND or OR depending whether the `grep` is including (uses AND) or excluding (uses OR). Only available from 2.1+. |

#### Record Accessor Enabled

This plugin enables the [Record Accessor](https://docs.fluentbit.io/manual/2.2/administration/configuring-fluent-bit/classic-mode/record-accessor) feature to specify the KEY. Using the *record accessor* is suggested if you want to match values against nested values.

## Getting Started

In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called `lines.txt` with the following content:

```
{"log": "aaa"}
{"log": "aab"}
{"log": "bbb"}
{"log": "ccc"}
{"log": "ddd"}
{"log": "eee"}
{"log": "fff"}
{"log": "ggg"}
```

### Command Line

> Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.

The following command will load the *tail* plugin and read the content of `lines.txt` file. Then the *grep* filter will apply a regular expression rule over the *log* field (created by tail plugin) and only *pass* the records which field value starts with *aa*:

```
$ bin/fluent-bit -i tail -p 'path=lines.txt' -F grep -p 'regex=log aa' -m '*' -o stdout
```

### Configuration File

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[SERVICE]
    parsers_file /path/to/parsers.conf

[INPUT]
    name   tail
    path   lines.txt
    parser json

[FILTER]
    name   grep
    match  *
    regex  log aa

[OUTPUT]
    name   stdout
    match  *
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
service:
    parsers_file: /path/to/parsers.conf
pipeline:
    inputs:
        - name: tail
          path: lines.txt
          parser: json
    filters:
        - name: grep
          match: '*'
          regex: log aa
    outputs:
        - name: stdout
          match: '*'

```

{% endtab %}
{% endtabs %}

The filter allows to use multiple rules which are applied in order, you can have many *Regex* and *Exclude* entries as required.

### Nested fields example

If you want to match or exclude records based on nested values, you can use a [Record Accessor ](https://docs.fluentbit.io/manual/2.2/administration/configuring-fluent-bit/classic-mode/record-accessor)format as the KEY name. Consider the following record example:

```javascript
{
    "log": "something",
    "kubernetes": {
        "pod_name": "myapp-0",
        "namespace_name": "default",
        "pod_id": "216cd7ae-1c7e-11e8-bb40-000c298df552",
        "labels": {
            "app": "myapp"
        },
        "host": "minikube",
        "container_name": "myapp",
        "docker_id": "370face382c7603fdd309d8c6aaaf434fd98b92421ce"
    }
}
```

if you want to exclude records that match given nested field (for example `kubernetes.labels.app`), you can use the following rule:

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[FILTER]
    Name    grep
    Match   *
    Exclude $kubernetes['labels']['app'] myapp
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
    filters:
        - name: grep
          match: '*'
          exclude: $kubernetes['labels']['app'] myapp
```

{% endtab %}
{% endtabs %}

### Excluding records missing/invalid fields

It may be that in your processing pipeline you want to drop records that are missing certain keys.

A simple way to do this is just to `exclude` with a regex that matches anything, a missing key will fail this check.

Here is an example that checks for a specific valid value for the key as well:

{% tabs %}
{% tab title="fluent-bit.conf" %}

```
# Use Grep to verify the contents of the iot_timestamp value.
# If the iot_timestamp key does not exist, this will fail
# and exclude the row.
[FILTER]
    Name                     grep
    Alias                    filter-iots-grep
    Match                    iots_thread.*
    Regex                    iot_timestamp ^\d{4}-\d{2}-\d{2}
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
    filters:
        - name: grep
          alias: filter-iots-grep
          match: iots_thread.*
          regex: iot_timestamp ^\d{4}-\d{2}-\d{2}
```

{% endtab %}
{% endtabs %}

The specified key `iot_timestamp` must match the expected expression - if it does not or is missing/empty then it will be excluded.

### Multiple conditions

If you want to set multiple `Regex` or `Exclude`, you can use `Logical_Op` property to use logical conjuction or disjunction.

Note: If `Logical_Op` is set, setting both 'Regex' and `Exclude` results in an error.

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[INPUT]
    Name dummy
    Dummy {"endpoint":"localhost", "value":"something"}
    Tag dummy

[FILTER]
    Name grep
    Match *
    Logical_Op or
    Regex value something
    Regex value error

[OUTPUT]
    Name stdout
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
pipeline:
    inputs:
        - name: dummy
          dummy: '{"endpoint":"localhost", "value":"something"}'
          tag: dummy
    filters:
        - name: grep
          match: '*'
          logical_op: or
          regex:
            - value something
            - value error
    outputs:
        - name: stdout
```

{% endtab %}
{% endtabs %}

Output will be

```
Fluent Bit v2.0.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/01/22 09:46:49] [ info] [fluent bit] version=2.0.9, commit=16eae10786, pid=33268
[2023/01/22 09:46:49] [ info] [storage] ver=1.2.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/01/22 09:46:49] [ info] [cmetrics] version=0.5.8
[2023/01/22 09:46:49] [ info] [ctraces ] version=0.2.7
[2023/01/22 09:46:49] [ info] [input:dummy:dummy.0] initializing
[2023/01/22 09:46:49] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2023/01/22 09:46:49] [ info] [filter:grep:grep.0] OR mode
[2023/01/22 09:46:49] [ info] [sp] stream processor started
[2023/01/22 09:46:49] [ info] [output:stdout:stdout.0] worker #0 started
[0] dummy: [1674348410.558341857, {"endpoint"=>"localhost", "value"=>"something"}]
[0] dummy: [1674348411.546425499, {"endpoint"=>"localhost", "value"=>"something"}]
```
