ForeFront UAG doesn’t recognise Trend Micro Deep Security 8 as compliant Anti Virus


I just noticed a new issue today with Microsoft’s ForeFront UAG and Trend Micro Deep Security.

The UAG does not recognise the Trend Micro Deep Security Agent as a compliant antivirus product  and therefore any clients using the Trend Micro Deep Security agent will not gain privileged session access to the UAG.

Interestingly enough, the UAG ForeFront Endpoint Scanner detects the Trend Firewall component.

To confirm this is from a physical desktop with the DS agent installed. The DS agent is offering anti-malware protection, not a Deep Security Virtual Appliance, so the UAG should be able to detect it.

I can understand virtual servers or desktops not being recognised there will not be way for the UAG to verify whether the client has AV services running on it.

What I have done is following the instructions here to try and customise the endpoint components detection script.

Thankfully the detection script DETECTION.VBS already has Trend Micro Office Scan so I have added a new check ‘DetectTrendMicroDeepSecurityAntiVirus’ in the script for Trend Micro Deep Security to validate whether it is installed and running but determining whether it is up to date is beyond me.

I have escalated to Trend Engineering to see if they can assist.

 

Trend DS 8 not detected in UAG Endpoint Detection

 

Advertisements

UAG and FIPS Compliance – How to implement 3DES in your SSLF environment


During the hardening of our DMZ domain we had applied the recommended Windows Server 2008 R2 SSLF Specialized Security Limited Functionality templates. What most people will know about the joys of implenting hardening policies is that you are bound to break every single application and the UAG is no exception.

If you apply the local security policy setting (as you should) “System cryptography: Use FIPS compliant algorithms for encryption, hashing, and signing” you will break the UAG.  The problem is as documented here. What you are doing by enabling this security policy is informing applications that they should only use cryptographic algorithms that are FIPS 140 compliant and in compliance with FIPS approved modes of operation.

In my case I could hit my login page fine but as soon as I got authenticated and passed through to the portal in my trunk I saw an error message:

An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
[HttpException (0x80004005): Unable to validate data.]
   System.Web.Configuration.MachineKeySection.EncryptOrDecryptData(Boolean fEncrypt, Byte[] buf, Byte[] modifier, Int32 start, Int32 length, IVType ivType, Boolean useValidationSymAlgo, Boolean signData)  +4308283

Yep, it was a complete mystery to me too.

The reason it happens is because ASP.NET 2.0 uses the RijndaelManaged implementation of the AES algorithm when it processes view state data. The RijndaelManaged implementation has not been certified by the National Institute of Standards and Technology (NIST) as compliant with the Federal Information Processing Standard (FIPS). Therefore, the AES algorithm is not part of the Windows Platform FIPS validated cryptographic algorithms and web pages are not served correctly.

The workaround is to configure ASP.NET to use 3DES instead of RijndaelManaged AES and is documented here:

  1. In a text editor such as Notepad, open the application-level Web.config file. In my case this was D:\Program Files\Microsoft Unified Access Gateway\von\PortalHomePage\web.config
  2. In the Web.config file, locate the <system.web> section.
  3. Add the following <machineKey> section to in the <system.web> section: <machineKey validationKey=”AutoGenerate,IsolateApps” decryptionKey=”AutoGenerate,IsolateApps” validation=”3DES” decryption=”3DES”/>
  4. Save the Web.config file.
  5. Restart the Microsoft Internet Information Services (IIS) service. To do this, run the following command at a command prompt: iisreset
  6. If you have additional UAG servers in an array, run iisreset on the other UAG servers.
  7. Test connection to your trunk. If you can login succesfully and hit your portal page, continue to next step.
  8. Enable the “System cryptography: Use FIPS compliant algorithms for encryption, hashing, and signing” GPO setting.
  9. Run gpupdate /force on all servers in your array.
  10. Re-Test connection to your trunk. If you can login succesfully and hit your portal page, continue to next step.

Microsoft never make it easy, eh!

UAG Array and External Load Balancer – Trunk cannot be activated due to the following: Invalid Internal IP address. Please choose a different IP.


This is a quick post to document an issue with the UAG if you have an array and you are using an external load balncer and therefore do not have the Forefront UAG integrated load balancing enabled.

What I initially tried to do was use the same IP addresses for my HTTP redirect trunk as my HTTPS trunk, so I had an HTTPS trunk ‘Trunk1’ already configured listening on public interface 192.168.0.1 and  I was trying to configure the UAG to redirect HTTP traffic, listening on the same IP address 192.168.0.1.

Not asking a lot I thought? Unfortunately this configuration cannot be actiavted if you are using a UAG array and external load balancer and you will get the error message ‘Trunk cannot be activated due to the following: Invalid Internal IP address. Please choose a different IP.’

You have to configure separate IP addresses for your HTTP trunks, even if they are only redirecting traffic to your HTTPS trunks.

I ended up adding 192.168.0.11 to my public interface network adapter (Dont add another network adapter, just add an IP address on the existing adapter) and reconfigured my HTTP trunk to listen on 192.168.0.11 and redirect all traffic to Trunk1 on 192.168.0.1.

As most Enterprises will be using an external load balancer this issue should come up in your enterprise environment.

This is caveat is documented at the bottom of this Technet article.

ESXi Hosts Not Responding APD – PowerPath PPVE 5.7 SP3 Build 173 and SRM 5


I’ve recently uncovered an issue with SRM 5 and the latest released version of Powerpath for ESXi – PPVE 5.7 SP3 Build 173 where Powerpath is not handling detaches devices properly after a SRM failover.

This is a known issue with SRM and Powerpath documented in VMware KB2016510  – ‘SRM TestFailover cleanup operation times out and fails with the error: Cannot detach SCSI LUN. Operation timed out: 900 seconds.’

This wasn’t the exact operation we had been performing. We had been undertaking Planned Migrations in the week preceding the incident rather than Test Failovers. Also there were no errors reported in SRM. In this post I wanted to document our symptoms so if you have a vBlock and SRM and you notice hosts becoming disconnected in vCenter; don’t panic… read on!

We had been running SRM 5 for a few months, but it seems we recently reached a tipping point after a period of extensive testing of  SRM planned migrations, test failovers and clean-ups. While we didn’t have any errors with our cleanup operations as per the above VMware KB article, out of the blue our ESXi hosts started to drop out of vCenter.

As we performed Planned Migration from the Recovery to Protected Site and back again, SRM was unmounting and detaching LUNS and Powerpath was incorrectly detaching the devices. Over time this caused the ESXi Hosts to stop responding within vCenter as they went into an APD (all paths down) state. First it was one host and then the following week it was five. Thankfully the VMs were not affected, but the hosts were completely unresponsive through the DCUI and we found the only fix was to gracefully shutdown the virtual machines via the OS and reboot the ESXi hosts. It was a real pain. Troubleshooting the  issue was compounded as lockdown mode was enabled and SSH\ESXi shell disabled.

The good news — this is a known issue with Powerpath VE that EMC are aware of. This is detailed in the emc knowledgebase – emc284091 “Esxcfg-rescan hangs after unmapping LUN from ESX with Powerpath/VE”. The root cause as per emc284091 — ‘This is a known issue with Powerpath/VE 5.7 where it is not handling detached devices properly. Detaching a device results to setting the device state to OFF and Powerpath/VE is not properly handling this state.’

We were advised by VMware not to perform any more SRM failovers until we have installed powerpath 5.7 P01. Thankfully EMC will supply you with an early beta to resolve the issue as P01 is only out in Q3 2012. We were supplied with PowerPath VE 5.7 P01 b002 and this appears to have solved the problem.

If you want to try and identify the fault yourself, look out for the following error message – ‘PowerPath: EmcpEsxLogEvent:1260:Error:emcp:MpxEsxVolProbe: Device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx probe already in progress’

There is also a SCSI sense code that you will normally find in the vmkernel.log but in our case we did not see it because I had to reboot the host to gather logs:

WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device “xxxxxxxxxxxxxxx” –  issuing command 0x4125001e09c0”

The above sense code  is the sense bye series. The PPVE / Hot Fix will now recognise it and respond accordingly.

Trend Deep Security Warning Message ‘Machine was unprotected during move from one esx host to another’


I wanted to post some more information on this Trend DS error message – ‘Machine was unprotected during move from one esx host to another’ as it seems to come up regularly.

The description of the error message is, ‘a virtual machine was moved to an ESX that does not have an activated Deep Security Virtual Appliance.’

In essence this warning message is saying that the ESXi host you vMotioned your VM too is not currently protecting the virtual machine.

This can be because there is no virtual appliance on the target ESXi host, the Trend Virtual Appliance is not offering Anti Malware protection, is not Activated or is Offline.

This error message will not show for unactivated virtual machines — A virtual machine has to be activated to generate this error message.

There is a known bug with this error message too – even though your VM is being protected by the appliance, the error message is always reported as an Agent error. Apparently Trend are working on this.

Back to the error message: When you receive this error message, what is the next step?

Trend is a complicated beast – An appliance can have issues for a number of reasons – whether there is a fault with the appliance or one of its dependencies is what you need to figure out. It could be something as basic as the appliance dropping off the network, losing connectivity back to the DSM or to the vShield Endpoint VMkernel port, or possibly its no  longer activated (not registered as a security appliance in vShield Manager.)

If you get this warning  message, open the virtual appliance that the VM is currently residing on and first ‘Clear Warnings/Errors’  so you remove any old status\error messages and then run ‘Check Status’ to see if there are any new issues. If there are errors reported on the appliance try and resolve them by following the patented ‘Trend DS Virtual Appliance Health Check’ below.

My main bugbear with Trend is that it is too complicated and it does not report its current state accurately and concisely. When I run a Check Status I want to know exactly what is going on. It would be most useful to have a health check screen on the appliance where the health check tests I mention below in the article are run sequentially in full view for the benefit of the administrator. Issue could be highlighted immediately and it would give us confidence that the appliance and its dependencies are all configured correctly, rather than having to check all the different components individually.

For example if you check the status of your appliance and it reports back that it is Managed and Online you would expect it to be managed, online and offering anti malware protection. In my testing after I changed the vShield VMkernel IP address on my ESXi host from 169.254.1.1 to 169.254.1.2, so the appliance could not offer anti malware protection, I ran a Check Status and the virtual appliance would still report that it was managed, online and offering anti malware protection.

On the plus side when I migrated a VM to the ESXi host with the misconfigured VMkernel port, the warning message was still generated that the VM is unprotected. What this shows is this error message is symptomatic of an underlying issue with your virtual appliance or ESXi host. While the issue may not be immediately noticable because the DSM reports that all is well, you should dig deeper following the ‘Trend DS Virtual Appliance Health Check’ below.

Bottom line — You cannot fully trust the DSM when you notice this error message. The only way to verify for sure that the appliance is actually working or not would be to drop the EICAR virus on the VM to confirm whether anti malware protection is working.

‘Trend DS Virtual Appliance Health Check’:

  1. Synchronise your Virtual Center(s) in Trend DSM
  2. Confirm your credentials for VVC and vShield are uptodate
  3. Confirm filter driver is installed on ESXi host via Trend DSM
  4. Confirm vShield driver is installed on ESXi host via vShield Manager
  5. Confirm Trend Appliance is registered as Security VM with vShield Manager
  6. Confirm the appliance is in the correct VLAN
  7. Confirm the appliance network configuration is correct
  8. Confirm you can ping the Appliance from the DSM.
  9. Confirm the VMkernel IP address for vShield Endpoint is correct on ESXi host – 169.254.1.1

and if nothing works follow my last resort:

10. Deactivate and reactivate the appliance

And if that fails…. Follow the blocksandbytes ‘Triple D’ process:

11. Deactivate, Delete and Deploy the appliance.

When I’m being lazy and I know the config hasn’t changed I will Deactivate and reactivate the appliance immediately. What I find with Trend is that as long as your environment is static, Trend will continue to stay Green, but if your environment is fairly dynamic and hosts are being rebooted, VMs are being built and vMotioned, you are performing SRM fail overs and fail backs, etc. it struggles to keep up with environment changes.

Every week I have to try and figure out why virtual machines are unhappy and do not have anti-malware protection. Hopefully this will help others stay on top of Trend DS 8.

vBlock Tip: vSphere 5, SIOC, EMC FAST VP and Storage DRS


If you’ve got a vBlock its most likely you’ve got an EMC array with EMC FAST VP and hopefully by now you’ve upgraded to vBlock Matrix 2.5.0 and you’re using vSphere 5.

If not, what are you waiting for? Oh yeah, there are still a few outstanding issues.  (My advice wait for the Storage vMotion issues to be resolved, its a real pain.)

I wanted to post some best practices and recommended settings for leveraging VMware’s Storage IO Control with EMC Fast VP and Storage DRS.

First a quick recap:

  • FAST VP is EMC’s sub LUN auto-tiering mechanism.
  • SIOC is VMware’s attempt to leverage the idea of DRS (distributed resource prioritisation) into the storage layer. SIOC  provides I/O performance monitoring and isolation of virtual machines in vSphere 5.
  • Storage DRS is a new feature in vSphere 5 which allows datastores to be pooled together as a single resource.

The bottom line: EMC FAST VP and SIOC are not only compatible but can work together harmoniously because they serve different purposes.

EMC FAST monitors data usage over an hourly period and only moves data once every 24 hours. Unlike SIOC, EMC FAST redistributes data based on the 1GB slice usage and lowers the response time of the busiest slices.

Compared to EMC FAST, SIOC uses a relatively short sampling window and is designed to quickly deal with short term IO contention crises. It can act quickly to throttle IO to limit guest latency during times of IO contention.

SIOC and EMC FAST perform complementary roles to monitor and improve storage performance, therefore they should both be leveraged in your environment.

And lastly Storage DRS – should it be used — yes, but  in what capacity?

My recommendation is to leverage Storage DRS in Automatic mode for initial placement to balance VMs evenly across datastores. I would also enable SDRS to monitor free capacity to make VM relocation recommendations if datastores approach capacity. The default setting is 90% which should be adequate.

What should be disabled though is IO Metrics — It is EMC’s recommendation Storage DRS IO metrics be disabled when using FAST VP. This is because they will perform competing roles, potentially identifying similar relocations and cause inefficient use of storage system resources.

So there you have it. The best way to leverage these components in your vBlock.

Sources:

There is a great EMC document here which lists best practice with EMC VNX Storage and vSphere and an old, but relevant article from Virtual Geek on SIOC and auto-tering.

vBlock Tip: Set VMFS3.MaxHeapSizeMB to 256MB on all ESXi hosts


This is the sort of issue I came across this week that you would expect VCE to make a de facto standard in the vBlock.

Why? Because the performance hit is negligible (a slight increase in additional kernel memory of 64mb), vBlock customers are likely to hit this ceiling and its another setting that we then don’t have to worry about.

I started running into issues vMotioning two VMs. It turns out this is a known issue as per KB1004424.

I was told by VMware: ‘ESXi 5.0 Host(Source on which you are trying to power on the VM)  already has 18 virtual disks(.vmdk) greater than 256GB in size open and you are trying to power on a virtual machine with another virtual disk of greater than 256GB in size.’

The heap size is effectively the amount of VMDK storage that can be hosted, across all virtual machines, for a given host.

The default heap size is a value of 80. To calculate the amount of open VMDK storage on the host that is available,  multiply 80 x 256 * 1024 — A 80MB heap value extends to 20 TB of open VMDK storage on a host.

Increasing the size to 256 results in 256 x 256 * 1024 — A 256MB heap value extends to 64 TB of open VMDK storage on a host, which should be plenty.

I was provided the following instructions which I repeated on each host to fix:

  1. Login to the vCenter Server or the ESXi host using the vSphere Client.
  2. Click on the configuration tab of the ESXi Host
  3. Click on Advanced Settings under Software
  4. Select VMFS3
  5. Change the value of VMFS3.MaxHeapSizeMB to 256
  6. Click on OK
  7. Reboot host

After rebooting each host the problem was solved. That was an easy one, for once!