I have been having major reliability issues with Deep Security 8 for the last 2 weeks.
There is a lot of confusion out there with the recommended settings for a stable Deep Security 8 environment, which isn’t helped when the vendor doesn’t have any KB articles or updated documentation publicly available. It appears they would rather their customers test their products for them and keep their teething issues with ESXi 5.0 under wraps. See my last post here.
This brings me to my issues with setting the DSAFILTER_HEAP_MAX_SIZE on each ESXi host. There is a recommended workaround mentioned on KB1055625 “How to enable the Deep Security Virtual Appliance (DSVA) to support more than 25 virtual machines on ESX.’ What most people don’t realise is this is a recommended workaround for Trend DS 7.0\7.5, not Trend DS 8.
The article that states that the DSAFILTER_HEAP_MAX_SIZE should be roughly 1MB per each VM planned. This results in most customers setting a value for the HEAP_MAX_SIZE of around 40 – 50MB. This is a 1/10th of what it should be according to Trend Engineering for DS 8 (512MB)!
Here is a quick summary:
I set a DSAFILTER_HEAP_MAX_SIZE on all my ESXi hosts with a value of 50MB. In theory according to the knowledge base article this should be sufficient for 40 VMs.
For Deep Security 8 Trend actually recommend not changing this value as the dsafilter is supposed to dynamically adjust the HEAP_MAX_SIZE according to the numbe rof VMs on the host. If you have set the value manually as per the KB article OR you encounter symptoms such as VMs loses connectivity intermittingly and then the hosts PSODing after 30mins – 2 hours then its most likely an inadequate HEAP MAX size and the filter driver is running out of memory.
Here are the calculcations from Trend Engeering. They have used a calculation of 70 VMs but state “Even if you are not planning on running as many VM’s it is still suggested to set the value to 512mb. “
We use 432 bytes per TCP connection so for the mathematical calculation lets roundup to 512 bytes per TCP allocation.
If using the default values for the Maximum TCP Connections. <SystemSetting name=”maxConnectionsTcp” value=”10000″ />
With this 512 bytes * 10000 connection = 5MB per VM so we need to have 70*5MB = 350MB minimum memory for the visor memory for the connection tables only.
In addition to this we will need ~ 70 MB to run the FD so it comes down to 420 MB.
We recommend bumping to 512MB and verify the results. e.g. esxcfg-module -s DSAFILTER_HEAP_MAX_SIZE=536870912 dvfilter-dsa
To verify the setting, execute: % esxcfg-module -g dvfilter-dsa The setting will not take effect until the driver is reloaded. Reloading will either require a reboot (best option) of ESX.
If you are running DS 8 and ESXi 5.0 and you have already implemented the MOD_TIMER=0 fix, the next step will be to manually set the MAX_HEAP_SIZE to 512MB.
This solved the issue for me and we haven’t had a single crash since.
About frikking time!
Michael Gioia from Trend contacted me to give me a more detailed analysis of why this issue occurs.
First of all there is Trend KB article out for this issue — http://esupport.trendmicro.com/solution/en-us/1060125.aspx
It sounds like there are a few unhappy customers out there. In Trend’s defence and the point put across by Michael, is that it is very difficult to ensure 100% compatibility with two products that are integrated in the kernel such as the ESXi hypervisor and the Trend filter.
This anomaly is only supposed to occur in very rare circumstances when the filter driver is under sever exhaustion of memory (circa 4 bytes).
In theory this issue shouldn’t occur with many customers then, but to be honest my setup was not anything special or extreme, so I’d be surprised if more customers weren’t affected by this.
On the plus side, the developers should have released a fix in 7.5 SP4 and 8.0 SP1 which are both available now.
I still have to verify this – until I am confident the issue is resolved my HEAP_MAX_SIZE settings will remain!