Trend Micro Deep Security and Citrix XenApp: The effect of Agentless AV on VSImax


I’ve been doing some benchmarking recently on our 2 socket 6 core 3.3GHz B200 M2’s used in our dedicated XenApp cluster (each ESXi host providing a total of 39.888GHz) to quantify the impact of AV protection on VSImax. (If you haven’t heard of LoginVSI before, it is a load testing tool for virtual desktop environments. VSImax is the maximum number of users workloads your environment can support before the user experience degrades (response times > 4 seconds) and is a great benchmark as it can be used across different platforms.)

We use Trend Micro Deep Security 9.1 in our environment providing agentless anti malware protection for our XenApp VMs. The Deep Security Virtual Appliances provides the real time scanning via the vShield Endpoint API using a custom XenApp policy that includes all the Anti Virus best practices for Citrix XenApp and Citrix PVS.

Test Summary:

  1. Testing Tool: LoginVSI 3.6 with Medium No Flash workload
  2. Citrix XenApp anti-malware policy: Real Time Scanning enabled with all the best practice directory, file and extension exclusions set as well as the recommendation to disable Network Directory Scan and only scan files on Write.
  3. Deep Security Virtual Appliance (DSVA): Deployed with the default settings: 2vCPU, 2GB RAM, no CPU reservation and a 2 GB memory reservation.

Shown below is a LoginVSI 150 user test with a medium (no Flash) workload on a single B200 M2 running 6x VMs with 4vCPU and 12GB RAM each with agentless protection disabled. The image below shows a VSImax score of 105, which is very similar to our current real user load per blade.

VSIMax with No AV

VSIMax with No AV

Shown below is the same 150 user test with a medium (No Flash) workload on a single B200 M2 running 6x VMs with 4vCPU and 12GB RAM each with agentless anti malware protection enabled. The image below shows a VSImax score of 101.

VSIMax with AV

VSIMax with AV

The impact on VSImax with Deep Security agentless protection enabled is only 4 users per blade which is only a 3.8% user penalty. Shown below is the CPU MHz usage of the DSVA during the LoginVSI test. CPU MHz peaks at 550MHz which is 1.3% of the total available MHz of the host (39888MHz).  An acceptable penalty to keep our security boys happy!

DSVA CPU MHz

DSVA CPU MHz

Advertisements

Trend Micro Deep Security 8 \ vShield Endpoint EPSEC and UNC\SMB Scanning


‘Another day another dollar’…. no no that’s not quite right…

‘Another day another Deep Security issue.’

Next up is Deep Security and UNC\SMB Scanning. This isn’t exactly Deep Security’s fault as this is another limitation of the vShield EPSEC driver.

Executables that are accessed via a SMB share will loop and need to be manually killed. This is a known limitation of the EPSEC driver as disccused in Trend Micro KB1059280.

In our case one of our applications was launching an executable from a UNC path which was crashing. We couldn’t figure out why but unmanaging the virtual machine fixed the problem.

It is relatively easy to fix this problem, but it does leave you exposed.

The EPSEC driver does not support exclusions of a particular server name, i.e. \\servername, nor can you exclude a directory on the server nor can you exclude a specific file even if you know the name. The only way to fix this problem is to exclude all UNC paths\SMB scanning, by updating your security policy and adding the exclusion ‘\\’ to your Directory exclusion list.

Even unticking ‘Scan Network Drives’ from within Trend has no effect. This has been raised as an incident and I am yet to hear back from Trend Micro.

I have been assured by VMware this will be resolved in vShield 5.1 released in Q4 2012.

Deep Security 8 SP1 Upgrade


As you guys and girls may be aware, Trend DS 8 SP1 has been out since the 30th April.

DS 8 SP1 promises support for wildcard exclusions and also adds linux support via an agent for on-demand scanning. (no real-time scanning yet).

There is also the added benefit of fixing the HEAP_MAX_SIZE PSOD issue but still waiting confirmation on this.

We’ve been having a few ongoing issues with our Trend environment mainly due to a lack of care and attention since I installed 7.5 SP1 and upgraded to DS 8. Also Trend is not the easiest beast to get up and running correctly. A lot of this is down to the documentation. The install guide (Getting Started?) is too  simplistic and the Best Practice documentation is confidential (go figure!) so I would definitely recommend professional services if you are think about buying Trend DS. And on the plus side you get someone to blame if anything goes wrong!

I thought the release of 8 SP1 would be a good oppurtunity to get the Trend boys onsite to blow away the existing DSM + database and install DS 8.0 SP1 from scratch.

Bear in mind this was a live cluster, so we effectively split the cluster in half and kept one half on DS 8 (with all the live VMs) and the other half was upgraded to DS 8 SP1.

We deployed a new VM, installed DSM 8 SP1 on a new database, prepared the ESXi hosts and deployed the new virtual appliances. Once the infrastructure was configured, the existing virtual machines were vmotioned onto the DS 8 SP1 hosts that were managed with the new 8 SP1 DSM.

This was a little tricky as you effectively had two DSM’s in operation on a single cluster – not recommended for long! The key to managing the VMs was to change the view to sort by host, then you could easily ignore all the unmanaged VMs on half the hosts that were not prepared.

Once the VMs were vmotioned across, we waited 5 minutes for their config to update (to ensure they still didn’t think they were being protected by a DS8 appliance) and then activated them on the new DS 8 SP1 virtual appliances on the new DSM.

After all the VMs were activated we could upgrade the remaining ESXi hosts and re-enable DRS to spread the VMs back across the cluster.

All in all it was a painless upgrade with no downtime and on the plus side Trend is looking much better.

If you have been through a few iterations of  Trend DS and  you’re having issues with high maintenance, VMs being unprotected, appliances going offline, etc I recommend this approach to clear out your infrastructure and database and start off fresh.

Yes you have to reconfigure your alerting and security profiles but its a small price to pay for a healthy, stable environment.

DS 8 SP1 — well recommended!

— UPDATE 11/06/2012 —

I have had confirmation from Trend HEAP_MAX_SIZE issue has been resolved in DS 8 SP1, but for now I’ve left the HEAP_MAX_SIZE variable set on all my ESXi hosts as it is still unclear in my mind whether this setting is no longer needed.

 

Deep Security 8 and DSAFILTER_HEAP_MAX_SIZE


I have been having major reliability issues with Deep Security 8 for the last 2 weeks.

There is a lot of confusion out there with the recommended settings for a stable Deep Security 8 environment, which isn’t helped when the vendor doesn’t have any KB articles or updated documentation publicly available. It appears they would rather their customers test their products for them and keep their teething issues with ESXi 5.0 under wraps. See my last post here.

This brings me to my issues with setting the DSAFILTER_HEAP_MAX_SIZE on each ESXi host. There is a recommended workaround mentioned on  KB1055625 “How to enable the Deep Security Virtual Appliance (DSVA) to support more than 25 virtual machines on ESX.’ What most people don’t realise is this is a recommended workaround for Trend DS 7.0\7.5, not Trend DS 8.

The article that states that the DSAFILTER_HEAP_MAX_SIZE should be roughly 1MB per each VM planned. This results in most customers setting a value for the HEAP_MAX_SIZE of around 40 – 50MB. This is a 1/10th of what it should be according to Trend Engineering for DS 8 (512MB)!

Here is a quick summary:

I set a  DSAFILTER_HEAP_MAX_SIZE on all my ESXi hosts with  a value of 50MB. In theory according to the knowledge base article this should be sufficient for 40 VMs.

For Deep Security 8 Trend actually recommend not changing this value as the dsafilter is supposed to dynamically adjust the HEAP_MAX_SIZE according to the numbe rof VMs on the host. If you have set the value manually as per the KB article  OR you encounter symptoms such as VMs loses connectivity intermittingly and then the hosts PSODing after 30mins – 2 hours then its most likely an inadequate  HEAP MAX size and the filter driver is running out of memory.

Here are the calculcations from Trend Engeering. They have used a calculation of 70 VMs but state “Even if you are not planning on running as many VM’s it is still suggested to set the value to 512mb. “

—————————————————————————————————————————————–

We use 432 bytes per TCP connection so for the mathematical calculation lets roundup to 512 bytes per TCP allocation.

If using the default values for the Maximum TCP Connections. <SystemSetting name=”maxConnectionsTcp” value=”10000″ />

With this 512 bytes  * 10000 connection = 5MB per VM so we need to have 70*5MB = 350MB minimum memory for the visor memory for the connection tables only.

In addition to this we will need ~ 70 MB to run the FD so it comes down to 420 MB.

We recommend bumping to 512MB and verify the results.  e.g. esxcfg-module -s DSAFILTER_HEAP_MAX_SIZE=536870912 dvfilter-dsa

To verify the setting, execute: % esxcfg-module -g dvfilter-dsa The setting will not take effect until the driver is reloaded.  Reloading will either require a reboot (best option) of ESX.

———————————————————————————————————————————————-

If you are running DS 8 and ESXi 5.0 and you have already implemented the MOD_TIMER=0 fix, the next step will be to manually set the MAX_HEAP_SIZE to 512MB.

This solved the issue for me and we haven’t had a single crash since.

About frikking time!

———————————————————————————————————————————————-

Update 30/05/2012

Michael Gioia from Trend contacted me to give me a more detailed analysis of why this issue occurs.

First of all there is Trend KB article out for this issue — http://esupport.trendmicro.com/solution/en-us/1060125.aspx

It sounds like there are a few unhappy customers out there. In Trend’s defence and the point put across by Michael, is that it is very difficult to ensure 100% compatibility with  two products that are integrated in the kernel such as the ESXi hypervisor and the Trend filter.

This anomaly is only supposed to occur in very rare circumstances when the filter driver is under sever exhaustion of memory (circa 4 bytes).

In theory this issue shouldn’t occur with many customers then, but to be honest my setup was not anything special or extreme, so I’d be surprised if more customers weren’t affected by this.

On the plus side, the developers should have released a fix in 7.5 SP4 and 8.0 SP1 which are both available now.

I still have to verify this – until I am confident the issue is resolved my HEAP_MAX_SIZE settings will remain!