Deep Security 8 and DSAFILTER_HEAP_MAX_SIZE


I have been having major reliability issues with Deep Security 8 for the last 2 weeks.

There is a lot of confusion out there with the recommended settings for a stable Deep Security 8 environment, which isn’t helped when the vendor doesn’t have any KB articles or updated documentation publicly available. It appears they would rather their customers test their products for them and keep their teething issues with ESXi 5.0 under wraps. See my last post here.

This brings me to my issues with setting the DSAFILTER_HEAP_MAX_SIZE on each ESXi host. There is a recommended workaround mentioned on  KB1055625 “How to enable the Deep Security Virtual Appliance (DSVA) to support more than 25 virtual machines on ESX.’ What most people don’t realise is this is a recommended workaround for Trend DS 7.0\7.5, not Trend DS 8.

The article that states that the DSAFILTER_HEAP_MAX_SIZE should be roughly 1MB per each VM planned. This results in most customers setting a value for the HEAP_MAX_SIZE of around 40 – 50MB. This is a 1/10th of what it should be according to Trend Engineering for DS 8 (512MB)!

Here is a quick summary:

I set a  DSAFILTER_HEAP_MAX_SIZE on all my ESXi hosts with  a value of 50MB. In theory according to the knowledge base article this should be sufficient for 40 VMs.

For Deep Security 8 Trend actually recommend not changing this value as the dsafilter is supposed to dynamically adjust the HEAP_MAX_SIZE according to the numbe rof VMs on the host. If you have set the value manually as per the KB article  OR you encounter symptoms such as VMs loses connectivity intermittingly and then the hosts PSODing after 30mins – 2 hours then its most likely an inadequate  HEAP MAX size and the filter driver is running out of memory.

Here are the calculcations from Trend Engeering. They have used a calculation of 70 VMs but state “Even if you are not planning on running as many VM’s it is still suggested to set the value to 512mb. “

—————————————————————————————————————————————–

We use 432 bytes per TCP connection so for the mathematical calculation lets roundup to 512 bytes per TCP allocation.

If using the default values for the Maximum TCP Connections. <SystemSetting name=”maxConnectionsTcp” value=”10000″ />

With this 512 bytes  * 10000 connection = 5MB per VM so we need to have 70*5MB = 350MB minimum memory for the visor memory for the connection tables only.

In addition to this we will need ~ 70 MB to run the FD so it comes down to 420 MB.

We recommend bumping to 512MB and verify the results.  e.g. esxcfg-module -s DSAFILTER_HEAP_MAX_SIZE=536870912 dvfilter-dsa

To verify the setting, execute: % esxcfg-module -g dvfilter-dsa The setting will not take effect until the driver is reloaded.  Reloading will either require a reboot (best option) of ESX.

———————————————————————————————————————————————-

If you are running DS 8 and ESXi 5.0 and you have already implemented the MOD_TIMER=0 fix, the next step will be to manually set the MAX_HEAP_SIZE to 512MB.

This solved the issue for me and we haven’t had a single crash since.

About frikking time!

———————————————————————————————————————————————-

Update 30/05/2012

Michael Gioia from Trend contacted me to give me a more detailed analysis of why this issue occurs.

First of all there is Trend KB article out for this issue — http://esupport.trendmicro.com/solution/en-us/1060125.aspx

It sounds like there are a few unhappy customers out there. In Trend’s defence and the point put across by Michael, is that it is very difficult to ensure 100% compatibility with  two products that are integrated in the kernel such as the ESXi hypervisor and the Trend filter.

This anomaly is only supposed to occur in very rare circumstances when the filter driver is under sever exhaustion of memory (circa 4 bytes).

In theory this issue shouldn’t occur with many customers then, but to be honest my setup was not anything special or extreme, so I’d be surprised if more customers weren’t affected by this.

On the plus side, the developers should have released a fix in 7.5 SP4 and 8.0 SP1 which are both available now.

I still have to verify this – until I am confident the issue is resolved my HEAP_MAX_SIZE settings will remain!

 

Advertisements

Known Issue! PSOD with Security 8 and ESXi 5.0


It looks like there is a known issue with Trend Deep Security 8 and ESXi 5.0 that causes PSOD. I cannot find a KB article yet, so am documenting it here. Hopefully this will help some people who don’t have a Trend support contract.

There is a known issue with ESXi 5.0 and Deep Security 8.0. A number of customer’s are experiencing ESXi system crashes – purple screen of death. By default the Deep Security Filter Driver will attempt to multiplex a single kernel timer across all virtual machines, to ensure they perform a maintenance task every 30 seconds.

This appears to be creating the instability issues and causing the system crashes as using a single timer across all VMs is complex to manage and implement.

The workaround is to disable this setting, so that the maintenance tasks execute without the timer. This occurs periodically anyway when the system processes packets, so there is no impact performing this change.

  1. SSH to ESXi. From the ESXi console, execute this command to find out the value that is configured for the Filter Driver heap memory size: Run % esxcfg-module -g dvfilter-dsa to see if you have modified the DSAFILTER_HEAP_MAX_SIZE
  2. If you have not configured the DSAFILTER_HEAP_MAX_SIZE value just set the DSAFILTER_MOD_TIMER_ENABLED to 0 with the following command: % esxcfg-module -s DSAFILTER_MOD_TIMER_ENABLED=0 dvfilter-dsa
  3. If you have configured the DSAFILTER_HEAP_MAX_SIZE value, use the following command to preserve your existing setting: % esxcfg-module -s “DSAFILTER_HEAP_MAX_SIZE= <value that you got from the last query> DSAFILTER_MOD_TIMER_ENABLED=0” dvfilter-dsa
  4. You should now see options = value set to DSAFILTER_MOD_TIMER_ENABLED=0 when you run % esxcfg-module -g dvfilter-dsa
  5. Reboot the ESXi server for the changes to take effect. Note: The setting will not take effect until the driver is reloaded. Reloading will require a reboot (best option) of ESXi or unloading/loading of the driver.