Storage Spaces Direct Explained – Management & Operations


Management & Operations

Good day everyone. It been a few weeks, like busy with work and such. Anyways, this post will go into how Management & Operations are done in S2D.  Now, my biggest pet peeve is complex GUI management and yet again, Microsoft doesn’t disappoint.  It still a number of steps in different interfaces to bring up S2D, Check out Aidan Finns blog post on disaggregated management from last year.  It still rings true to this day with the release of 2016. It shouldn’t be this complex IMO 🙁 That being said, let move to the details.

Management & Operations

Management & Operations

Microsoft is pushing everyone to use PowerShell as the primary management tool for Storage Spaces, but you can also manage it with a combination of Windows Failover Cluster Manager, SCVMM, and SCOM as mentioned above. So if you are good at Powershell, management is fairly simple. If not, then you have the classic switching between different tools management experience :(. This is why everyone really needs to start their PowerShell training now, to survive as an architect in Microsoft land going forward ;).

Management & Operations Management & Operations Management & Operations

There is a Health Service built into Windows Server 2016 that provides some decent system health and status information for Storage Spaces. I just saw a few demos at ignite16 and have not played with it yet, so I’ll have to dig into this further and see how they stack up in a future post.

Management & OperationsManagement & Operations

S2D supports cluster aware updating that integrates with the Windows Update Service. Like VSAN, because they run in kernel, they need to live migrate VMs off the host server, perform the update, reboot, and then migrate everything back. I’ll note that this is only the case for the hyper-converged deployment model. In a converged model where the VMs are on a separate compute tier, you can update the storage controllers one at a time fairly seamlessly without impacting VMs on the separate compute tier.

Management & Operations

While I am not a big fan of the management,  this could give rise to tools like 5nine if they decide to support S2D management. Next up. Application and Performance, Until next time, Rob.

Storage Spaces Direct Explained – Fault Tolerance and Multisite Replication



Fault Tolerance…What does it mean?  Let me break it down simply. Pictured above is just a bad design, not fault tolerance. This is not really what fault tolerance means. Having two or more of something is one factor, but how it’s implanted is just as important.  Fault Tolerance incorporates two very important principles, High Availablity and Redundancy.

Now if we had a few toilets side by side and kept only 1 open and the other 2 on standby. Also, if it could move the user automatically to another toilet during a failure, then it technically it would be fault tolerant. Anyways, let’s move on from toilets to the real world. 🙂


Simply, Fault Tolerance is the ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer. In the event one component fails, another takes over without skipping a beat.

Many systems are designed to recover from a failure by detecting the failed component and switching to another computer system. These systems, although sometimes called fault tolerant, are more widely known as “high availability” systems, requiring that the software resubmits the job when the second system is available.

True fault tolerant systems with redundant hardware are the most costly because the additional components add to the overall system cost. However, fault tolerant systems provide the same processing capacity after a failure as before, whereas high availability systems often provide reduced capacity. Ok, let move on to fault tolerance in S2D.

Fault Tolerance in S2D

Storage Space Direct (S2D) uses 3-way mirroring and will spread those mirrors across 3 different servers in the cluster. S2D supports full chassis and rack awareness and gives you the option to distribute data copies across these fault domains.

For disk failures, S2D also uses a self-healing approach… in basic terms, S2D offlines the disk and rebuilds the data copy on another node in the cluster. Replacing a drive adds capacity back into the system.  This is important note as not all HCI vendors support self-healing, For example, on VSAN and some other vendors, disk failures take out entire vDisks.

Fault Tolerance Fault Tolerance Fault Tolerance Fault Tolerance

Multisite Replication

S2D uses Storage Replica (that ships with Windows Server 2016) for synchronous or async replication. They support both stretched clusters and cluster to cluster DR. Storage Replica is part of Windows Server  can be used for other data replication needs outside of S2D.

Fault Tolerance Fault Tolerance

Ok…Next up, Storage QOS and Networking, until next time, Rob….