Reducing the Data Volume

Since Splunk is licensed by daily indexed data volume it is in every customer’s interest do keep the data volume generated by uberAgent as small as possible. Luckily uberAgent is heavily customizable and offers several ways to minimize what is sent to Splunk for indexing.

Take Stock

Before modifying the configuration find out how much data is generated per endpoint by the default settings. The easiest way to do that is to have uberAgent tell you in the Data Volume dashboard.

Reduce the Data Volume per Endpoint

Once you know the currently generated data volume you should have an idea by how much it needs to be reduced. Start with the endpoint configuration.

Through uberAgent’s configuration file you can do three things to reduce the data volume:

Reduce the Frequency

By default uberAgent collects performance data every 30 seconds. You can cut the volume nearly in half by changing the frequency to one minute (any other value is possible, too, of course).

You can fine-tune the data collection by adding additional timers. The data collection frequency can be set per timer. Move each metric to the timer with the desired frequency to optimally balance accuracy and data volume. While optimizing focus on those metrics that generate the highest data volume (the Data Volume dashboard shows you which those are).

Remove Metrics

By default all metrics are enabled. If you do not need the information collected by some of them turn them off by removing them from the configuration.

Special Treatment for ProcessDetail

As you can see in the Data Volume dashboard, the ProcessDetail metric generates by far the highest data volume. Consider replacing ProcessDetailFull with ProcessDetailTop5. Once you do that, uberAgent only collects performance data for processes with the highest activity. This may lead to a dramatic reduction in data volume.

Alternatively you can filter the processes for which detailed performance data is collected. The configuration file section ProcessDetailFull_Filter makes it possible to whitelist or blacklist processes (whitelisting overrides blacklisting). If you are only interested in specific processes, whitelist them. On the other hand if you want to see everything except data from certain processes blacklist those.

Reduce the Number of Endpoints

If the data volume is still too high after optimizing the configuration as recommended above you need to reduce the number of endpoints that send data to Splunk. You can simply do that by stopping and disabling the uberAgent system service on select endpoints.

Questions?

Do you have questions that were not answered here? Please ask us, we are happy to help!