Reducing the Data Volume
Since Splunk is licensed by daily indexed data volume, it is in every customer’s interest to keep the data volume generated by uberAgent as small as possible. uberAgent comes prepared for that by offering two default configurations and many ways for fine-tuning.
Start by choosing either the default configuration, which provides full detail and high resolution or the configuration optimized for data volume, which differs from the default in the following ways:
- Process & application performance: information is collected only on the 10-15 most active processes in terms of CPU, RAM, disk, and network utilization. The processes included in the data collection are determined dynamically for every collection interval. One could say uberAgent “follows” the active processes.
- Collection interval of 120 s instead of 30 s.
See this document for instructions on how to switch between the two configurations.
Before modifying the configuration, find out how much data is generated per endpoint by the default settings. The easiest way to do that is to have uberAgent tell you in the Data Volume dashboard.
Once you know the currently generated data volume, you should have an idea by how much it needs to be reduced. Start with the endpoint configuration.
Through uberAgent’s configuration you can do three things to reduce the data volume:
By default, uberAgent collects performance data every 30 seconds. You can cut the volume nearly in half by changing the frequency to one minute (any other value is possible, too, of course).
You can fine-tune the data collection by adding additional timers. The data collection frequency can be set per timer. Move each metric to the timer with the desired frequency to optimally balance accuracy and data volume. While optimizing, focus on those metrics that generate the highest data volume (the Data Volume dashboard shows you which those are).
By default, all metrics are enabled. If you do not need the information collected by some of them, turn them off by removing them from the configuration.
As you can see in the Data Volume dashboard, the ProcessDetail metric generates by far the highest data volume. Consider replacing
ProcessDetailTop5. Once you do that, uberAgent only collects performance data for processes with the highest activity. This may lead to a dramatic reduction in data volume.
ProcessDetailTop5, only the top 5 ProcessDetail metrics are collected based on each of the following criteria:
- Process CPU usage
- Count of process I/O read/write operations
- Amount of process I/O read/write operations data volume
- Process consumed RAM
- Process generated network traffic
Alternatively, you can filter the processes for which detailed performance data is collected. The configuration section
ProcessDetailFull_Filter makes it possible to put processes on an allowlist or denylist. The allowlist takes precedence over the denylist. If you are only interested in specific processes, put them on the allowlist. On the other hand, if you want to see everything except data from certain processes, put them on the denylist.
If the data volume is still too high after optimizing the configuration as recommended above you need to reduce the number of endpoints that send data to Splunk. You can simply do that by stopping and disabling the
uberAgent system service on select endpoints.