Distributed Cloud AIP Agent Performance

The F5 Distributed Cloud App Infrastructure Protection (AIP) Agent is engineered to provide maximum functionality with the lowest possible resource consumption and system impact. We’re continuously optimizing the Agent’s performance and resource consumption to ensure as small a footprint as possible.

Agent Performance Overview

Our Agent’s components are designed to be as efficient with system resources as possible. We include additional, dynamic behaviors that allow us to adapt to various system load and configuration profiles.

For instance, on multi-core systems, the Agent will avoid binding to core 0, which is typically where network and file I/O are bound on Linux systems, to minimize the chance of introducing blocking.

We use adaptive CPU profiling to throttle our activities when load levels are high, for instance when the kernel audit framework begins to emit a large number of events at high frequency. You can set configuration parameters to adjust these levels to allow better tuning for your particular system.

Processes, CPU, and Memory Usage

The Distributed Cloud AIP Agent is a composite entity, made up of several components that you can view in your process viewer of choice:

  • cloudsight-main: the main process supervisor and command-and-control component
  • cloudsight-worker: worker process spawned by cloudsight-main to manage inputs from various sensors
  • tsauditd: our replacement for auditd that consumes, processes, and transforms raw kernel audit events with high-performance, low-latency, and minimal compute resource usage
  • tsfim: sensor to perform targeted Filesystem Integrity Monitoring (FIM) based on Rules created in the Cloud Security Platform
  • tscontainersd: sensor to gather events from Docker containers

At any given time, and depending on your feature plan level and configuration, you will see this set of processes executing on your server.

When evaluating our Agent’s CPU and memory consumption profile, keep the following scenarios in mind.

  • High number of kernel audit messages: Certain workloads, especially those with extremely high rates of forking or execve’ing of subprocesses will generate an increased number of kernel audit messages. If these subprocesses are short-lived, in the case of a forking process manager for instance, this can generate many events per second. Our Agent will attempt to decode, process, and transform all of these events and keep with the output rate of the audit framework, up to the CPU limits we’ve placed (by default, we cap at 40% CPU utilize of the core we’re running on).

    In addition to increased CPU load, you may notice increased memory load as our internal caches and batching of events will grow to accommodate the increased flow of events.
  • Broad FIM rules: Customers can create FIM rules in the Cloud Security Platform, which are then delivered to Agents. We place very few restrictions on these watches, so there exists a potential for a customer to create a FIM rule to monitor an extremely busy filesystem location or one that may be too broad in scope. We leverage built-in Linux filesystem APIs (inotify and fanotify) and each file, directory, or combination will generate additional events, consuming CPU and memory resources.
  • Agent connectivity issues: When the Agent disconnects, it drops events and does not send them to the platform.

Note

If you want to constrain Agent memory and CPU utilization, you can configure the Agent to use Self-Control to further conserve resources. For more information, see Self-Control Agent Resource Utilization.

When reviewing our memory consumption, it’s important to focus on resident (RES) memory and not virtual memory usage of our Agent’s components. The memory footprint for cloudsight-main and cloudsight-worker are the components that will generally see increases in memory usage as event counts increase. Those processes will attempt to garbage collect, resulting in some periods of increased memory consumption capped with a steep drop back to more “normal” levels.

The Distributed Cloud AIP platform can give you insight into process execution behaviors on a global scale — we provide aggregate information on the Events view of all processes executing across your fleet, as well as at a local, server level, which you can view in Servers > Server Details. From there, you can view detailed process execution detail and see, at a granular level, which processes are generating the most events. From there, we can offer additional tuning to reduce the event flow (some events may be unimportant and filtered out).

Agent Performance at Distributed Cloud AIP

At Distributed Cloud AIP, we run the current production version of the Agent on our production infrastructure — every system we run to provide the Distributed Cloud AIP platform is running our Agent. Our development environment runs a mixture of production and development Agents. For the purposes of this document, we’ll be using statistics from our production environment to describe our Agent’s performance. We do no special tuning of the Agent for our platform and run the exact same code as customers.

Was this article helpful?
0 out of 0 found this helpful