Wednesday, December 23, 2015

Bish-bash with VSAN 6

Recently we've finished deploying 10-node VSAN 6 cluster and it's really heart-warming news for business that we're able to save around 150 TB of enterprise grade SAN, while provisioning our first hyper-converged production ready VSAN. although there were couple of hiccups while provisioning and deploying with lot of bish-bash here and there with vendors (HP & VMware) involved. but at the end it's all hunky dory.



Before jumping on to VSAN bandwagon, I've done a small POC with VSAN 5.5 (4-node cluster) and gone through couple of VSAN HOL (Some are based on old VSAN 5.5, now archived/removed) i.e. "HOL-SDC-1608 Virtual SAN 6 from A to Z" , which gives you complete knowledge about how to handle a VSAN, but their is one caveat, it's all virtual, so you won't get real-world experiences or should I say challenges while implementing a real VSAN.

So to begin with VSAN, we chose VSAN ready nodes from HP-
Bill-of-Materials-

  • HP DL380 GEN9 8SFF CTO SERVER
  • HP DL380 GEN9 E5-2690V3 FIO KIT
  • HP DL380 GEN9 E5-2690V3 KIT
  • HP 32GB 2RX4 PC4-2133P-R KIT
  • HP DL380 GEN9 8SFF CAGE BAY2/BKPLN KIT
  • HP 1.6TB SAS SSD 12GB/ S 2.5"
  • HP 1.2TB 6G SAS 10K 2.5IN DP ENT SC HDD
  • HP FLEXFABRIC 10GB 2P 556FLR-SFP+ ADPTR
  • HP SMART ARRAY P440AR/2G FIO CONTROLLER
  • HP SMART ARRAY P440/4G CONTROLLER
  • HP ETHERNET 10GB 2P 560SFP+ ADPTR
  • HP 2U SECURITY BEZEL KIT
  • HP 2U SFF EASY INSTALL RAIL KIT
  • HP 8GB MICRO SDHC FLASH MEDIA KIT
  • HP 800W FS PLAT HT PLG PWR SUPPLY KIT
  • HP 2U CMA FOR EASY INSTALL RAIL KIT
  • HP ONEVIEW INCL 3YR 24X7 SUPP FIO BUNDLE PHYS 1 SV
  • HP 3Y 4 HR 24X7 PROACTIVE CARE SVC
  • HP ONE VIEW W/ILO SUPP
  • HP PROLIANT DL380 GEN9 SUPPORT
  • COMPATIBLE 10GBASE-CU ENHANCED SMALL FORM-FACTOR PLUGGABLE (SFP+) TRANSCEIVER 3M

This way we connected controllers to two disk box in front of server and populated disks.

Used 10 G pNICs for both VSAN replication and VM traffic with a pair of Nexus 5K's as ToR switch.
once h/w is all assembled/setup, then running ESXi 6.0.1 (HP Respin) was a breeze. VSAN binaries are baked in ESXi itself, so you need not to do any specific config to enable VSAN. we used SD card for ESXi installation and followed best practices shared on Cormac's blog.
Don't forget to enable HBA-mode on SCSI card/s for Pass-through, as VSAN requires this to be enabled beforehand, or you won't see disks.

we took help of VSAN POC document for higher success rate and there is free session available from VMworld 2015 STO 4572.

we used "host profiles" for identical configuration across hosts and also used some powercli automation for configuring Multicast settings on all hosts.

Once our VSAN cluster was up & running and we migrated some test load on it, within 24 hours one host shown a PSOD (IML Logs shown error as POST Lockup 0x13) and then our real bish-bash started. logged a support case with VMware and HP, shared H/W logs (IML/ADU/Offline diagnostics etc) and VMware support bundle for analysis. I started digging on the issue myself and found that there is HP advisory for this known issue and resolution was to update the firmware to 3.52. little late we heard from both vendors to update the HP scsi card firmware to 3.52 and HPSA (scsi driver) to 6.0.0.114-1OEM. But, both these firmware/driver were not validated for VSAN. 

Note- while logging support case with vendor, specifically mention in bold that you're running VSAN, otherwise your case will land up with someone who is not trained/aware of VSAN. there are few engineers available with VSAN knowledge/training/troubleshooting skills.



VMware support assured us that they are pro-actively working on validating this new firmware/driver-
"I am aware that the engineering teams are working to certify this ASAP.
I have also seen other customers using the 3.52. fw successfully as HP recommended to do it.But from a Tech Support perspective, when a FW version is not on the VSAN HCL we can not give you a official recommendation to use it.

I would also recommend that you speak to a VSAN specialist in HP and see what they recommend."
keeping our fingers crossed, we updated everything to suggested version and didn't see any issue crop-up again. kudos to VSAN health check plugin, which is very helpful in monitoring and diagnosing.



Note-Use Web-client only for VSAN related configs, you won't see much through legacy C# client.

Please share, be social.

Saturday, December 12, 2015

Starting Daily Journal from today

Yeah! i know about the hiatus blog post, but sometimes things wont happen the way you always wanted. A lot of work and other life events have slowed me down and kept me away from my blog. but recently I've been reading couple of very provocative blogs posts about keeping up with daily Journaling and how to write better blogs, so it's time to implement knowledge gained from below mentioned blog posts.

http://lifehacker.com/kick-off-your-daily-journaling-habit-with-this-simple-t-1746951866

http://blogs.technet.com/b/wikininjas/archive/2015/10/13/wiki-life-english-tips.aspx

http://lifehacker.com/block-off-time-for-deep-work-to-make-headway-on-your-1745622623?trending_test_two_f&utm_expid=66866090-68.hhyw_lmCRuCTCg0I2RHHtw.5