Advertisements

So you bought an Equallogic San, now what…. part two

… belay those orders for setting up the SAN stress testing. Let’s first get some alerts working, and call back home functionality. This email from EQL lays it out well

*****

Using rinetd for management of a closed network
Solution Details 
Network Access for Management and Notification

Networks set up for iSCSI SANs often have limited connectivity to general “public” network. This poses significant problems when management and event notifications use standard TCP/IP protocols. One example of this is when trying to configure event notification via SMTP. If the iSCSI network does not have an SMTP server, and there is no gateway from the iSCSI network to the public network, significant events notifications cannot be delivered to the responsible persons.

Additionally, management of the EqualLogic array would have to be done from one of the servers directly connected to the iSCSI network.

One method of circumventing this problem is to use port redirection. This is a procedure that takes requests on a particular port and interface, and routes them to another node. One program that has been used successfully is rinetd, available from http://www.boutell.com/rinetd/ . This program can be used to route SMTP, HTTP, telnet, and SSH traffic to and from an array through a Windows system to one or more nodes on the public network.

The following lines are a configuration that will:
1. route all SMTP traffic received on interface 192.168.30.201 to the system at address 192.168.10.200.
2. Route all HTTP traffic received on interface 192.168.10.201 to the array at address 192.168.30.10.
3. Route all traffic received on any interface for ports 3002 and 3003 to the array at address 192.168.30.10.

192.168.30.201 25 192.168.10.200 25
192.168.10.201 80 192.168.30.10 80
0.0.0.0 3002 192.168.30.10 3002
0.0.0.0 3003 192.168.30.10 3003

Using two tools from the Microsoft Windows 2003 Resource Kit Tools, this program can be set up to run as a service. Srvany.exe allows the rinetd to be run as a service, and instsrv.exe does the actual installation. They are available at:

http://www.microsoft.com/downloads/details.aspx?displaylang=en&familyid=9d467a69-57ff-4ae7-96ee-b18c4790cffd

You may also find this related solution of use:
Network ports used by a PS Series group.

FYI – there is no equivalent tool in Windows 2008. The customer will have to set up a router.

*****

Instead of using srvany.exe I got clued on to using ServiceEx from this post http://blog.ehuna.org/2009/10/an_easier_way_to_access_the_wi.html

So config your rinetd (note: it looks like the java portion of the group manager uses ports 3002 and 3003, hence its use in the example from EQL)

And then use ServiceEx to make it into a service. The cool thing about this is that you can also access group management from any PC as well! When I used rinetd it didn’t want to bind to port 80 because of course vcenter web server was running there. So I just bound it to 8080.

Ok, so now you have a way to get your SMTP messages out so go and enable the call back home functionality in the Equallogic. If all goes well you should get a couple of email messages, and maybe a phone call letting you know they heard from you.

Ok, now let’s try stress testing the SAN. So this is my crude and not very scientific method. First things first, create a vritual machine with two nics (so we can use MPIO) and it needs to have a server OS. Ok, put those NICS on the same subnet of your SAN so they can talk to it (I like the VMXNET 3, I have no idea if this is good or bad for iSCSI, but seems to work) . Edit the properties to enable Jumbo frames. Download the HIT (Host Integration Tools) and install them. Of course yet another download that is called Setup.exe maybe my wining will change this…. probably not… anyways the HIT installs: EqualLogic Multipath I/O DSM, integrates with Microsoft built in MPIO, Microsoft iSCSI Initiator, iSCSI Initiator properties tab, enables Dell EqualLogic MPIO tab

If you had an outside facing NIC configured you would want to exclude that NICs subnet from the MPIO. In order to do so, press start, programs, equallogic, and then go to the remote setup wizard (I have no idea why this setting is here, seems a button off of the the new tab in the iSCSI initiator would make more sense but that is just me) the third radial button is Configure MPIO settings for this computer, this is where you can exclude the desired subnet.

Ok, so on the SAN create 4 volumes. Use the iSCSI initiator to connect to the volumes (Using MPIO)

I’m not sure if it matters if on the discovery if you choose to hard code to use the iSCSI initiator for the adapter, and then choose one of the adapters. And then do this again for the other adapter.

But next fire up Iometer use one worker and then control click and select the four volumes. Use 128 for the # of Oustanding I/O’s on this tab. Create a new Specification (On the Access Specifications Tab) Move the slider to 100% Sequential and the other slider to 100% Read and set the Transfer Request Size to 64 Kilobytes. Don’t forget to press Add to move it to the Assigned Access Specifications. On results display move it to five seconds (so you can see what its doing) and then on Test Setup choose how long you want to run it for. Also, don’t forget to save your settings.

Now that this is all setup hit the green flag to see some action. What you should see on the SAN headquarters (as well as IOmeter) is over 100 MB of read access. So to really stress test this sucker, clone your VM a couple of times, create four more volumes per each new VM, attach the new VM to its respective new volumes and then let it rip. I got it up to around 400 MB a second this way and then left it to run over the weekend. After coming back on Monday the SAN had transfered in total I think somewhere around 70 terrabytes.

Well if you still feel the need for punishment stay tuned for part three of this post.

Advertisements

Can’t connect to vmware data recovery virtual appliance

I couldn’t for the life of me connect to the data recovery virtual appliance. I tried all sorts of things, this forum has a lot of helpful hints…

http://communities.vmware.com/thread/208385?start=0&tstart=0

But this post gave me the answer,

I called into Tech Support and they have confirmed the following:

Vista & Windows 2008 (32bit/64bit) vSphere Client has issues with the Data Recovery Plugin. Sometimes the plugin doesn’t register, sometimes it does. When it does register, you will not be able to connect your Data Recovery appliance. It does however work on XP. Once you’ve registered the appliance from an XP client, and walked through the “Getting Started Wizard” you can access the appliance from the Vista/Windows 2008 vSphere Client without issues (you may need a restart of your vSphere client).

The tech confirmed with an internal VMware BUG ID # 412552.

So I installed the client and plugin on my XP box and lo and behold I could connect. After running through the getting started wizard I could then connect on my 2003 box that has vcenter installed…. so strange but I’m just glad its working. Oh yeah they have a newer version out 1.1

So you bought an EqualLogic SAN, now what… part one

This post might be extremely long. I like to document things just in case I can spare other people the pain that I went through.

Ok, so you just bought an iScsi SAN right. So you should learn a bunch about iScsi I would think. If you are using iScsi with vmware you NEED to read this.

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf or my copy is here http://sites.google.com/site/mellerbeck/Home/vsp_40_iscsi_san_cfg.pdf?attredirects=0&d=1

If you are doing 3.5 then read this one http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_iscsi_san_cfg.pdf

I would read this as well… http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html

So, did you read it? I mean go READ IT. Trust me! What I gleaned from those guides is this: (copied from http://www.yellow-bricks.com/2008/07/21/queuedepth-and-whats-next/

…. often overlooked is Disk.UseLunReset and/or Disk.UseDeviceReset. ESX defaults to Disk.UseLunReset=1 and Disk.UseDeviceReset=1. This means that when a SCSI bus is reset all SCSI reservations are cleared, not for a specific LUN but for the complete device. This is useful when one uses local storage, but within a VMware environment most companies utilize a SAN and you don’t want to disrupt the entire SAN when it’s not necesarry. You can set this via the commandline, powershell and via VirtualCenter:

  1. VirtualCenter -> Configuration Tab -> Advanced Settings -> Disk -> Disk.UseLunReset=1 , Disk.UseDeviceReset=0
  2. Get-VMHost | Set-VMHostAdvancedConfiguration -Name Disk.UseDeviceReset -Value 0
  3. Commandline -> esxcfg-advcfg -s 1 /Disk/UseLunReset
    Commandline -> esxcfg-advcfg -s 0 /Disk/UseDeviceReset

The next thing I learned from the guide is I think you want to change the Disk.MaxLUN parameter. It defaults to 255 but unless you are planning on having that many luns setting it to say 50 is more reasonable and will make your ESX boot quicker as well as scan LUNS quicker.

In the 3.5 guide it mentions removing the vmfs-2 module (not so in the 4 manual) but it is easy to do as laid out in this post. http://www.yellow-bricks.com/2009/03/13/disabling-the-vmfs-2-module-exploring-the-next-generation-of-esx/

Basically run this command. esxcfg-module -d vmfs2

It also mentions somewhere (I couldn’t find it again) that if you only made changes to an iSCSI LUN then you only need to rescan that one (just right click the vmhba33 and rescan)

OK, on with the show

Get access to the dell\equallogic support site. You will need it for firmware and software.

  • Create an EqualLogic Support Account

Our setup: 3 IBM 3850 M2 with 2 processors, 64 GB of RAM. 2 nics four ports (etherchannel bonded on incoming Cisco like this)  2 nics four ports for iscsi traffic. And a nic for the service console, and another gig nic for vmotion. Nics are Intel PRO/1000 dual port controllers

What we bought: two PS6000XV, two Dell PowerConnect 6224 Ethernet Switch’s ( We probably should have gone with the 6248 for more ports but this is sufficient)

Two Stacking Modules for our Ethernet Switches. Everywhere I read recommends stacking the switches as the way to go.

A lot of cat 6 network cable of two colors. We refer to one switch as our Red switch and another as our Orange switch. This way we have a visual check of whether every system has redundant paths.

Ok, so the hardware arrived. Rack and stack them along with the switches. Connect the stacking cables between the switches. How should you do that well look here on page 32

Manual for Dell Power Connect 6224

http://support.dell.com/support/edocs/network/pc62xx/en/UG/PDF/ug_en.pdf

ug_en.pdf_001

I don’t know who names things over at Dell, but they seriously need to revisit their taxonomy.

Ok, so now everything should be cabled up. You want redundant paths from end to end.

Ok, configure the switch. But how should we configure it? Yet again how Dell names things leaves me less than amused. Of course the guide that you need to configure your dell power connect 6224 for use with equallogic is named none other than…. wait for it…. Dell EqualLogic Configuration Guide…. yup thats it.

Ok where is it? You can find it here http://sites.google.com/site/mellerbeck/Home/Dell_EqualLogic_Configuration_Guide.pdf?attredirects=0&d=1

So on page 22 (26) you can get the step by step commands to configure that switch, can Dell make it any harder to find this stuff?

I got a hold of a couple of other manuals for this switch that are pretty useful.

http://sites.google.com/site/mellerbeck/Home/PowerConnectTuning-rev1.07.pdf?attredirects=0&d=1

I did everything but enable the Cut-Through Switch Forwarding.

And here is the manual if you need to configure LAG’s http://sites.google.com/site/mellerbeck/Home/Dell-PowerConnect-How_to_configure_LAG_LACP-1.0.pdf?attredirects=0&d=1

Here is a list of some manuals as well http://support.dell.com/support/edocs/network/pc62xx/en/index.htm

Double check your firmware on the  Switch 2.2.0.3 seems to work. Power connect 6224 Firmware is hiding here http://ftp.dell.com/firmware/

So from the Dell Equallogic Configuration Guide I gleaned that stacking seems very recommended. Also, some main requirements are enabling Flow Control. For Flow control on the 6224 it is a global setting. I have set it and not seen it go out to all ports. So I disabled an enabled it again. And it went out to most ports (most would read active a few inactive) after disabling and enabling it again it seems to have finally stuck. A next important point is no STP functionality on switch ports that connect end nodes (end nodes are your ESX boxes or the SAN) which in my case is pretty much it! So I disabled STP on everything. (Why disable STP? It can cause delays that can cause your fail-overs to not work!) If you must use STP they recommend Rapid STP which to turn that on is in the config guide. Then finally, enable Jumbo Frames.

This is what my switch looks like, if you know it should be different please let me know! Since its hard to get straight answers with these switches for some reason.

Ok, so your switches are configured.

Next, turn on one of your SAN’s (If you happen to have two, this way you can name it otherwise you have to locate the serial num which is impossible) and run the setup wizard. RAID 50 seems to be the defacto for most people. Add it to the group. Upgrade the firmware on your SAN (can download from the equallogic support site). Do it at the beginning and get it out of the way now 🙂

Read the release notes for your firmware, here it is for the 4.2.1 http://sites.google.com/site/mellerbeck/Home/110-6024-EN-R1_RNotes_V4.pdf?attredirects=0&d=1

The supported configuration Limits is something to pay attention to! page 4 (8)

What I gleaned from the Release notes are a few Reg edits that you want to make to anything attaching to the SAN.

Increase the value of the TimeOutValue parameter (HKEY_LOCAL_MACHINE/SYSTEM/
CurrentControlSet/Services/Disk/TimeOutValue) to at least 60 seconds (the default is
10 seconds). Make sure to set the type to DWORD and enter the value (60) in decimal.
This will increase the timeout period for all disk I/Os for the disk class driver (in contrast to driver-
specific Registry parameters that affect only your iSCSI initiator). You must reboot the server for
the change to take effect.

If you install the VMware tools it will automatically make this change (I believe, it doesn’t hurt to check)

***

For any environment (clustered or not) using multi-path I/O, make the following changes:
–    Add or modify the UseCustomPathRecoveryInterval key, and set its value to 1.
–    Add or modify the PDORemovePeriod key, and set its value to 120.
–    Add or modify the PathRecoveryInterval key to 60, or half the value of the
PDORemovePeriod key, if it exists.

HKLM\System\CurrentControlSet\Services\mpio\Parameters\UseCustomPathRecoveryInterval                     1
HKLM\System\CurrentControlSet\Services\mpio\Parameters\PDORemovePeriod                                                   120
HKLM\System\CurrentControlSet\Services\mpio\Parameters\PathRecoveryInterval                                            60

These also look they are added by a vmware tools install as well.

This is also an interesting tip for Vista or Server 2008.

Accessing PS Series Groups Using iSCSI Initiators on Microsoft Vista or Windows Server
2008. To support group access from initiators running on these operating systems, you must enable
ICMP echo requests for ICMPv4, or for ICMPv6, if using an IPv6 initiator.

Turn on the other SAN. Name it. Add it to the group. RAID 50,

Alrighty, now I’m really gonna make your eyes bleed. Configuring VMware vSphere Software iSCSI with Dell EqualLogic PS Series Storage.pdf

***** UPDATE they released a newer version of this guide (I have to wonder if I influenced it with this blog post) it lays out more clearly a one to one setup of virtual nics to physical nics for multipathing. You can find it here

http://sites.google.com/site/mellerbeck/Home/ConfiguringVMwarevSphereSoftwareiSCSIwithDellEqualLogicPSSeriesStorageTR1049.pdf?attredirects=0&d=1 or on the google.

(You could also google that up) So following that guide we create a virtual switch with jumbo frames, and created 8 vnics (the maximum), then assigned network adapters, then associate the vmkernel ports to the physical adapters,  enable iscsi, then finally bind the VMkernel ports. At the end of the PDF there is a scripting section that will help automate this.

OK, on second thought don’t create 8 virtual nics. The equallogic has a limitation of 512 iSCSI connections. So every time I was creating a volume I was eating up 24 connections. So now I am reducing this to 4 virtual nics to match the four physical ports, probably makes more sense that way anyways. I did learn that each pool can have 512 connections so maybe what I should have done was create separate pools. But once we added all of our space into the default pool this wasn’t an option any more. To learn more about Storage Pools you could read this http://sites.google.com/site/mellerbeck/Home/DeployingPoolsandTieredStorageinaPSSeriesSAN..pdf?attredirects=0&d=1

As an interesting note these connections will show up as partial redundancy.

And the explanation seems to be

“In the case of your FC Storage VMWare knows that the end point is two different WWNs so it knows it has different paths.    In the case of iSCSI you are connect to just our Group IP so it doesn’t know if the path is Fully Redundant.   This seems to be the same with All iSCSI storage from what we can tell.”

So now lets put the SAN through its paces. First thing I think you should test is all of the redundant paths. So create a volume and attach to it. I put a VM on it and started it up. I started pulling connections from a least dangerous to most dangerous. So first I simulated a dead nic by pulling the ethernet to it. Next I tested pulling power to the slave switch. Next, I tested pulling power to the master switch and seeing how long failover took and whether the VM’s survived. Oh you might want to make sure you install these patches on your vsphere boxes beforehand https://www.equallogic.com/enewsletter/technnote_0072009.html or else your volume might not stay connected. Finally I pulled power to the master controller on the Equallogic. All of these test helped verified that the redundant paths were working as advertised.

Ok, now that that is verified lets start stress testing the infrastructure. But first off install the Dell Equallogic SAN HeadQuarters so we can get some visibility into the pounding we are giving the SAN. You download SAN HeadQuarters from the support site. Unfortunately its an EXE that is named Setup.exe, in fact just about everything you download is named Setup.exe grrr….. bad form, bad form. Ok, so to use SAN HeadQuarters you will need to add a read only SNMP community name. So through Group Configuration of your Equallogic, click the SNMP tab, and then click add to add one. Now when you install SAN headquarters you will use that public SNMP community name.

So now you can see your SAN activity, here is a snapshot of my SAN being really bored.

If that hasn’t scarred you enough you can always read part two of this post.

https://michaelellerbeck.com/2009/11/30/so-you-bought-an-equallogic-san-now-what-part-two/

So a couple of addendum’s. EqualLogic recommends: Linked here http://bit.ly/6Iw3ot

Removing all services other than TCP/IP from the SAN interfaces and unchecking the “Register this connection’s address in DNS” box anyway, to reduce the possibility of any non-iSCSi traffic from traversing the SAN interface. Also, change the Binding Order of the network cards to have the SAN one last. This will cause the server to report the first bound interface (usually the public/LAN interface) to DNS first, making that the preferred interface for the server.

***

If you are installing SQL or Exchange on an iSCSI volume you want to set service dependencies http://bit.ly/5mDqWn

***

When choosing between basic and dynamic disk, only server 2008 is supported for iSCSI dynamic disk otherwise choose basic http://bit.ly/8vVjHY

***

This is a decent guide on server 2003 on iSCSI http://sites.google.com/site/mellerbeck/Home/DeployingMicrosoftWindowsServer2003inaniSCSISAN..pdf?attredirects=0&d=1

One important thing to take from the guide is aligning partitions even for the OS partition. I talk a lot more about that here

https://michaelellerbeck.com/2009/12/21/so-you-bought-an-equallogic-now-what-virtualize-exchange-edition/

***

I would also recommend doing this to all VM’s (Unless someone tells me otherwise??)

We have found that on Windows file servers, the system automatically updates the “Last Access Time” field of the directory entry for each file touched, which can prove to have a very relevant impact on snapshot and replication utilization.

To disable Last Access Time handling on an NTFS filesystem, add the following key to the Registry on your server:

HKeyLocalMachine\SYSTEM\CurrentControlSet\Control\FileSystem\NtfsDisableLastAccessUpdate
REG_DWORD
And set it to 1

It requires a reboot of the host to take effect.

***

AnywhereUSB drivers

AnywhereUSB Drivers v2.70

http://sites.google.com/site/mellerbeck/Home/anywhere_USB_WinXp_Vista.7z?attredirects=0&d=1

 

Finding usbd.sys for server 2008 enterprise for an AnywhereUSB

This was a pain. Basically mount the .iso

Download imagex.exe from here http://www.tipandtrick.net/2008/imagex-600118000-x86-and-x64-for-windows-server-2008-and-vista-sp1-standalone-download/

This explains what to do with it http://4sysops.com/archives/how-to-mount-a-wim-image-with-imagex-in-windows-vista/

Right click on wimfltr.inf and then click on “install”

Reboot.

Create a new folder this will be your mount point mine was c:\test

Run the command imagex.exe /mount d:\sources\install.wim 5 c:\test

Now you can search for usbd.sys in the c:\test folder

Or you could just download it from here (this is the 64bit version for server2008  enterprise)

http://sites.google.com/site/mellerbeck/Home/usbd.zi?attredirects=0&d=1

rename to a .zip and unzip

Old Datastores still hanging on after storage motion on vsphere

I have been able to clear this by doing a vmotion off to another host.

vpxclient.exe runtime error vsphere vcenter import crashes

So I ran across the lovely error that vcenter would crash when I would go to import a machine. After much googling I finally ran across this post

http://www.vmware.com/support/vsphere4/doc/vsp_vcc_41_rel_notes.html

This is the jist of it

Microsoft Visual C++ Runtime Library error message is displayed during import or export of a machine
If a user environment has versions of VI Client 2.5 Update 1, Update 2, Update 3, or Update 4 that coexist with vSphere Client 4.0, and you install the vCenter Converter 4.1.0 client plug-in, when you start the vCenter Converter Import or Export wizard, the vSphere Client session is terminated abruptly. An OpenSSL DLL conflict between the VI Client versions and the vCenter Converter 4.1.0 client plug-in causes this problem.
Workaround: Go to the Launcher folder in the VI Client install directory, for example, C:\Program Files\VMware\Infrastructure\Virtual Infrastructure Client\Launcher, and delete the following DLL files from that location:

  • libeay32.dll
  • ssleay32.dll

I deleted these file from my windows\system32 folder (after verifying that the timestamp was old) and this did the trick.