Reducing the Data Volume

Since Splunk is licensed by daily indexed data volume it is in every customer’s interest do keep the data volume generated by uberAgent as small as possible. uberAgent comes prepared for that by offering two default configurations and many ways for fine-tuning.

Choose Between Detail and Data Volume

Start by choosing either the default configuration, which provides full detail and high resolution, or the configuration optimized for data volume, which differs from the default in the following ways:

  • Process & application performance: information is collected only on the 10-15 most active processes in terms of CPU, RAM, disk and network utilization. The processes included in the data collection are determined dynamically for every collection interval. One could say uberAgent “follows” the active processes.
  • Collection interval of 120 s instead of 30 s

Take Stock

Before modifying the configuration find out how much data is generated per endpoint by the default settings. The easiest way to do that is to have uberAgent tell you in the Data Volume dashboard.

Reduce the Data Volume per Endpoint

Once you know the currently generated data volume you should have an idea by how much it needs to be reduced. Start with the endpoint configuration.

Through uberAgent’s configuration you can do three things to reduce the data volume:

Reduce the Frequency

By default uberAgent collects performance data every 30 seconds. You can cut the volume nearly in half by changing the frequency to one minute (any other value is possible, too, of course).

You can fine-tune the data collection by adding additional timers. The data collection frequency can be set per timer. Move each metric to the timer with the desired frequency to optimally balance accuracy and data volume. While optimizing focus on those metrics that generate the highest data volume (the Data Volume dashboard shows you which those are).

Remove Metrics

By default all metrics are enabled. If you do not need the information collected by some of them turn them off by removing them from the configuration.

Special Treatment for ProcessDetail

As you can see in the Data Volume dashboard, the ProcessDetail metric generates by far the highest data volume. Consider replacing ProcessDetailFull with ProcessDetailTop5. Once you do that, uberAgent only collects performance data for processes with the highest activity. This may lead to a dramatic reduction in data volume.

Alternatively you can filter the processes for which detailed performance data is collected. The configuration section ProcessDetailFull_Filter makes it possible to whitelist or blacklist processes (whitelisting overrides blacklisting). If you are only interested in specific processes, whitelist them. On the other hand if you want to see everything except data from certain processes blacklist those.

Reduce the Number of Endpoints

If the data volume is still too high after optimizing the configuration as recommended above you need to reduce the number of endpoints that send data to Splunk. You can simply do that by stopping and disabling the uberAgent system service on select endpoints.

Questions?

Do you have questions that were not answered here? Please ask us, we are happy to help!