NPP Training series – Data Structure on Nutanix with Hyper-V

Posted on June 5, 2015 by Robert Corradini

To continue NPP training series here is my next topic: Data Structure on Nutanix with Hyper-V
If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V
To give credit due, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it and have updated the graphics for Hyper-V.

Data Structure on Nutanix

The NDFS (Nutanix Distributed Filesystem) is composed of the following high-level structs:

Storage Pool

Key Role: Group of physical devices
Description: A storage pool is a group of physical storage devices including PCIe SSD (Solid State Drive), SSD, and HDD (Hard Disk Drive) devices for the cluster. The storage pool can span multiple Nutanix nodes and is expanded as the cluster scales. In most configurations only a single storage pool is leveraged.

Container

Key Role: Group of VMs/files
Description: A container is a logical segmentation of the Storage Pool and contains a group of VM (Virtual Machine) or files (vDisks). Some configuration options (eg. (RF) Resiliency Factor) are configured at the container level, however are applied at the individual VM/file level. Containers typically have a 1 to 1 mapping with a datastore (SMB Share(s)).

vDisk

Key Role: vDisk
Description: A vDisk is any file over 512KB on NDFS including VM hard disks. vDisks are composed of extents which are grouped and stored on disk as an extent group.

Below we show how these map between NDFS and the Hyper-V:
SP_structure Data Structure

Extent

Key Role: Logically contiguous data
Description: A extent is a 1MB piece of logically contiguous data which consists of n number of contiguous blocks (varies depending on guest OS block size). Extents are written/read/modified on a sub-extent basis (aka slice) for granularity and efficiency. An extent’s slice may be trimmed when moving into the cache depending on the amount of data being read/cached.

Extent Group

Key Role: Physically contiguous stored data
Description: A extent group is a 1MB or 4MB piece of physically contiguous stored data. This data is stored as a file on the storage device owned by the CVM (Controller Virtual Machine). Extents are dynamically distributed among extent groups to provide data striping across nodes/disks to improve performance.

Below we show how these structs relate between the various filesystems:
NDFS_DataLayout_Text Data Structure Here is another graphical representation of how these units are logically related:
NDFS_DataStructure3 Data Structure Next up, I/O Path Overview

Until next time, Rob…

NPP Training series – Cluster Components with Hyper-V

Posted on June 2, 2015 by Robert Corradini

To continue NPP training series here is my next topic: Cluster Components

If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V
Data Structure on Nutanix with Hyper-V
I/O Path Overview

To give credit, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it.

Cluster Components

The Nutanix platform is composed of the following high-level components:

NDFS_Cluster Components

Cassandra

Key Role: Distributed metadata store
Description: Cassandra stores and manages all of the cluster metadata in a distributed ring like manner based upon a heavily modified Apache Cassandra. The Paxos algorithm is utilized to enforce strict consistency. This service runs on every node in the cluster. Cassandra is accessed via an interface called Medusa.

Medusa

Key Role: Abstraction layer
Description: Medusa is the Nutanix abstraction layer that sits in front of the cluster’s distributed metadata database, which is managed by Cassandra..

Zookeeper

Key Role: Cluster configuration manager
Description: Zeus stores all of the cluster configuration including hosts, IPs, state, etc. and is based upon Apache Zookeeper. This service runs on three nodes in the cluster, one of which is elected as a leader. The leader receives all requests and forwards them to the peers. If the leader fails to respond a new leader is automatically elected. Zookeeper is accessed via an interface called Zeus.

Zeus

Key Role: Library interface
Description: Zeus is the Nutanix library interface that all other components use to access the cluster configuration, such as IP addresses. Currently implemented using Zookeeper, Zeus is responsible for critical, cluster-wide data such as cluster configuration and leadership locks.

Stargate

Key Role: Data I/O manager
Description: Stargate is responsible for all data management and I/O operations and is the main interface from Hyper-V (via SMB 3.0). This service runs on every node in the cluster in order to serve localized I/O.

Curator

Key Role: Map reduce cluster management and cleanup
Description: Curator is responsible for managing and distributing tasks throughout the cluster including disk balancing, proactive scrubbing, and many more items. Curator runs on every node and is controlled by an elected Curator Master who is responsible for the task and job delegation. There are two scan types for Curator, a full scan which occurs around every 6 hours and a partial scan which occurs every hour.

Prism

Key Role: UI and API
Description: Prism is the management gateway for component and administrators to configure and monitor the Nutanix cluster. This includes Ncli, the HTML5 UI and REST API. Prism runs on every node in the cluster and uses an elected leader like all components in the cluster.

prism1 Cluster Components prism2 Cluster Components

Genesis

Key Role: Cluster component & service manager
Description: Genesis is a process which runs on each node and is responsible for any services interactions (start/stop/etc.) as well as for the initial configuration. Genesis is a process which runs independently of the cluster and does not require the cluster to be configured/running. The only requirement for genesis to be running is that Zookeeper is up and running. The cluster_init and cluster_status pages are displayed by the genesis process.

Chronos

Key Role: Job and Task scheduler
Description: Chronos is responsible for taking the jobs and tasks resulting from a Curator scan and scheduling/throttling tasks among nodes. Chronos runs on every node and is controlled by an elected Chronos Master who is responsible for the task and job delegation and runs on the same node as the Curator Master.

Cerebro

Key Role: Replication/DR manager
Description: Cerebro is responsible for the replication and DR capabilities of DFS(Distributed Storage Fabric). This includes the scheduling of snapshots, the replication to remote sites, and the site migration/failover. Cerebro runs on every node in the Nutanix cluster and all nodes participate in replication to remote clusters/sites.

Pithos

Key Role: vDisk configuration manager
Description: Pithos is responsible for vDisk (DFS file) configuration data. Pithos runs on every node and is built on top of Cassandra.

Next up, Data Structures which comprises high level structs for Nutanix Distributed Filesystem

Until next time, Rob….

NPP Training series – Cluster Architecture with Hyper-V

Posted on May 28, 2015 by Robert Corradini

To continue NPP training series here is my next topic: Cluster Architecture

To give credit, some of this content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it.

Cluster Architecture

The Nutanix solution is a converged storage + compute solution which leverages local components and creates a distributed platform for virtualization aka virtual computing platform. The solution is a bundled hardware + software appliance which houses 2 (6000/7000 series) or 4 nodes (1000/2000/3000/3050 series) in a 2U footprint. Each node runs an industry standard hypervisor (ESXi, KVM, Hyper-V currently) and the Nutanix Controller VM (CVM). The Nutanix CVM is what runs the Nutanix software and serves all of the I/O operations for the hypervisor and all VMs running on that host. For the Nutanix units running VMware vSphere, the SCSI controller, which manages the SSD and HDD devices, is directly passed to the CVM leveraging VM-Direct Path (Intel VT-d). In the case of Hyper-V the storage devices are passed through to the CVM. Below is an example of what a typical node logically looks like:

NDFS_NodeDetail2 Cluster Architecture

Together, a group of Nutanix Nodes forms a distributed platform called the Distributed Storage Fabric (DFS). DFS appears to the Hyper-V like any centralized storage array, however all of the I/Os are handled locally to provide the highest performance. More detail on how these nodes form a distributed system can be found below. Below is an example of how these Nutanix nodes form NDFS and then presented up to Hyper-V via SMB 3.0 Share(s):

DFS uses a software-defined, shared-nothing, scale-out approach to storage that eliminates the need for you to deploy a separate SAN along with its performance bottlenecks and scalability limitations. DFS leverages local SSD for fast VM performance and consolidates high capacity HDDs for cost-effective storage capacity.

The application data is intelligently placed in the appropriate storage tier, balancing storage performance and capacity needs. The environment’s noisy VMs on different hosts won’t impact the performance for any workloads—fulfilling key performance requirements for hybrid deployments.
Here are the key points with Hyper-V on Nutanix:

Hypervisor sees the Distributed Storage Fabric (DFS) as one or more SMB 3.0 file shares
Supports features like snapshots, dedupe, compression web-scale out, and disaster recovery
Locally shared storage is comprised of both flash and spinning disks
Variety of models (compute heavy, storage heavy, etc.)
Mix and match models within the same cluster
Pay as you grow – Start small and linearly scale your Microsoft infrastructure in minutes without the scalability shortcomings of traditional servers and storage.

Next up in the NPP Training series – Cluster Components

NPP Training series – Nutanix Terminology – Part 2

Posted on March 2, 2015 by Robert Corradini

I was asked “What does Web-Scale Mean?” at the end of my last post.

If you missed part 1, check link below:
Part 1 – NPP Training series – Nutanix Terminology

Continue reading →

NPP Training Series – Nutanix Terminology – Part 1

Posted on February 26, 2015 by Robert Corradini

I started with Nutanix about 1 month ago and concurrently began my Nutanix training which entails going through the NPP (Nutanix Platform Professional) training course. I will be posting a series of blog posts on my learning track with Nutanix and will eventually tie it all together with the Microsoft Stack. My track commenced about 2 weeks ago and I have keep detailed notes on my learning progress. My plan is to post every few days in a multi-part series.