NPP Training series – Drive Breakdown

To continue NPP training series here is my next topic:  Drive Breakdown
If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V

Data Structure on Nutanix with Hyper-V
I/O Path Overview

To give credit, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it.

Drive Breakdown

In this section I’ll cover how the various storage devices (SSD / HDD) are broken down, partitioned and utilized by the Nutanix platform. NOTE: All of the capacities used are in Base2 Gibibyte (GiB) instead of the Base10 Gigabyte (GB).  Formatting of the drives with a filesystem and associated overheads has also been taken into account.

SSD Devices

SSD devices store a few key items which are explained in greater detail above:

  • Nutanix Home (CVM core)
  • Cassandra (metadata storage) – MORE
  • OpLog (persistent write buffer) – MORE
  • Extent Store (persistent storage) – MORE

Below we show an example of the storage breakdown for a Nutanix node’s SSD(s):
NDFS_SSD_breakdown3 Drive Breakdown
NOTE: The sizing for OpLog is done dynamically as of release 4.0.1 which will allow the extent store portion to grow dynamically.  The values used are assuming a completely utilized OpLog.  Graphics and proportions aren’t drawn to scale.  When evaluating the Remaining GiB capacities do so from the top down.  For example the Remaining GiB to be used for the OpLog calculation would be after Nutanix Home and Cassandra have been subtracted from the formatted SSD capacity. Most models ship with 1 or 2 SSDs, however the same construct applies for models shipping with more SSD devices. For example, if we apply this to an example 3060 or 6060 node which has 2 x 400GB SSDs this would give us 100GiB of OpLog, 40GiB of Content Cache and ~440GiB of Extent Store SSD capacity per node.  Storage for Cassandra is a minimum reservation and may be larger depending on the quantity of data.
NDFS_SSD_3060_2 Drive Breakdown
For a 3061 node which has 2 x 800GB SSDs this would give us 100GiB of OpLog, 40GiB of Content Cache and ~1.1TiB of Extent Store SSD capacity per node.
NDFS_SSD_3061v2 Drive Breakdown

HDD Devices

Since HDD devices are primarily used for bulk storage, their breakdown is much simpler:

  • Curator Reservation (Curator storage) – MORE
  • Extent Store (persistent storage)

NDFS_HDD_breakdown Drive Breakdown
For example, if we apply this to an example 3060 node which has 4 x 1TB HDDs this would give us 80GiB reserved for Curator and ~3.4TiB of Extent Store HDD capacity per node.
NDFS_HDD_3060 Drive Breakdown
NOTE: the above values are accurate as of 4.0.1 and may vary by release.
Next up, I figured we would look at some of the cool software technologies that run on our CVM (Controller Virtual Machine), next up Elastic Dedupe Engine.

Until next time, Rob

Nutanix Community Edition – Public Beta – Now Available – Build Your Own Nutanix Test Lab for Free

nutanix-community-edition_w_500

Nutanix Community Edition

Another very exciting announcement was Nutanix Community Edition (CE) on June 9th, 2015 at our Inaugural .NEXT conference. So, what is it?…..Our website describes it the best “Community Edition is a 100% software solution enabling technology enthusiasts to easily evaluate the latest hyperconvergence technology at zero cost.”  In other words, you can use your own hardware to test out Nutanix.  Very cool.  This is great for building a lab and just gaining understanding of hyperconvergence hands on.
Nutanix is offering a hardware compatibility list (HCL) to users that includes the minimum requirements to run the software; essentially, any standard x86 server can be used….
And to quote our CEO and co-founder Dheeraj Pandey,
“From our very first software release in 2012, Nutanix has been dedicated to open architectures and technologies, offering unprecedented customer choice and flexibility,” “Community Edition is the next step in democratizing hyperconverged infrastructure technology, enabling anyone to experience the transformative benefits of our software. Only by eliminating the requirement for proprietary hardware and embracing off-the-shelf platforms can the next revolution of datacenter technologies be fully realized.”
As the name implies, the support for the CE will come from the community through Nutanix’s NEXT online portal. Users will be able to log in, ask questions and get answers from the community.
CE also allow you to also check our new hypervisor based on KVM and Acropolis. Check out Josh Odger’s Blog to learn more about Acropolis.
Join the beta…And don’t forget my NPP training series that helps you with all the concepts around hyperconvergence.
Currently, I am getting started with Nutanix CE installation and will be posting my experiences in a later blog post with how I build my Nutanix Lab @ Home. 🙂

Until next time….Rob

NPP Training series – I/O Path Overview

To continue NPP training series, here is my next topic:  I/O Path Overview
If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V

Data Structure on Nutanix with Hyper-V

To give credit, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean-to it.

IO Path Overview

The Nutanix IO path is composed of the following high-level components:
NDFS_IO_basev5 IO Path

OpLog

  • Key Role: Persistent write buffer
  • Description: The Oplog is similar to a filesystem journal and is built to handle bursty writes, coalesce them and then sequentially drain the data to the extent store.  Upon a write the OpLog is synchronously replicated to another n number of CVM’s OpLog before the write is acknowledged for data availability purposes.  All CVM OpLogs partake in the replication and are dynamically chosen based upon load.  The OpLog is stored on the SSD tier on the CVM to provide extremely fast write I/O performance, especially for random I/O workloads.  For sequential workloads the OpLog is bypassed and the writes go directly to the extent store.  If data is currently sitting in the OpLog and has not been drained, all read requests will be directly fulfilled from the OpLog until they have been drain where they would then be served by the extent store/content cache.  For containers where fingerprinting (aka Dedupe) has been enabled, all write I/Os will be fingerprinted using a hashing scheme allowing them to be deduped based upon fingerprint in the content cache.

Extent Store

  • Key Role: Persistent data storage
  • Description: The Extent Store is the persistent bulk storage of NDFS and spans SSD and HDD and is extensible to facilitate additional devices/tiers.  Data entering the extent store is either being A) drained from the OpLog or B) is sequential in nature and has bypassed the OpLog directly.  Nutanix ILM will determine tier placement dynamically based upon I/O patterns and will move data between tiers.

Content Cache

  • Key Role: Dynamic read cache
  • Description: The Content Cache (aka “Elastic Dedupe Engine”) is a deduped read cache which spans both the CVM’s memory and SSD.  Upon a read request of data not in the cache (or based upon a particular fingerprint) the data will be placed in to the single-touch pool of the content cache which completely sits in memory where it will use LRU until it is ejected from the cache.  Any subsequent read request will “move” (no data is actually moved, just cache metadata) the data into the memory portion of the multi-touch pool which consists of both memory and SSD.  From here there are two LRU cycles, one for the in-memory piece upon which eviction will move the data to the SSD section of the multi-touch pool where a new LRU counter is assigned.  Any read request for data in the multi-touch pool will cause the data to go to the peak of the multi-touch pool where it will be given a new LRU counter.  Fingerprinting is configured at the container level and can be configured via the UI.  By default fingerprinting is disabled.
  • Below we show a high-level overview of the Content Cache:

CC_Pools IO Path

Extent Cache

  • Key Role: In-memory read cache
  • Description: The Extent Cache is an in-memory read cache that is completely in the CVM’s memory.  This will store non-fingerprinted extents for containers where fingerprinting and dedupe disabled.

Drive Breakdown

In this section I’ll cover how the various storage devices (SSD / HDD) are broken down, partitioned and utilized by the Nutanix platform. NOTE: All of the capacities used are in Base2 Gibibyte (GiB) instead of the Base10 Gigabyte (GB).  Formatting of the drives with a filesystem and associated overheads has also been taken into account.

SSD Devices

SSD devices store a few key items which are explained in greater detail above:

  • Nutanix Home (CVM core)
  • Cassandra (metadata storage) – MORE
  • OpLog (persistent write buffer)
  • Extent Store (persistent storage)

Below we show an example of the storage breakdown for a Nutanix node’s SSD(s):
NDFS_SSD_breakdown3 IO PathNOTE: The sizing for OpLog is done dynamically as of release 4.0.1 which will allow the extent store portion to grow dynamically.  The values used are assuming a completely utilized OpLog.  Graphics and proportions aren’t drawn to scale.  When evaluating the Remaining GiB capacities do so from the top down.

For example the Remaining GiB to be used for the OpLog calculation would be after Nutanix Home and Cassandra have been subtracted from the formatted SSD capacity. Most models ship with 1 or 2 SSDs, however the same construct applies for models shipping with more SSD devices. For example, if we apply this to an example 3060 or 6060 node which has 2 x 400GB SSDs this would give us 100GiB of OpLog, 40GiB of Content Cache and ~440GiB of Extent Store SSD capacity per node.  Storage for Cassandra is a minimum reservation and may be larger depending on the quantity of data.
NDFS_SSD_3060_2 IO Path
For a 3061 node which has 2 x 800GB SSDs this would give us 100GiB of OpLog, 40GiB of Content Cache and ~1.1TiB of Extent Store SSD capacity per node.
NDFS_SSD_3061v2 IO Path

HDD Devices

Since HDD devices are primarily used for bulk storage, their breakdown is much simpler:

  • Curator Reservation (Curator storage) – MORE
  • Extent Store (persistent storage)

NDFS_HDD_breakdown IO Path
For example, if we apply this to an example 3060 node which has 4 x 1TB HDDs this would give us 80GiB reserved for Curator and ~3.4TiB of Extent Store HDD capacity per node.
NDFS_HDD_3060 IO PathFor a 6060 node which has 4 x 4TB HDDs this would give us 80GiB reserved for Curator and ~14TiB of Extent Store HDD capacity per node.
NDFS_HDD_6060 IO PathStatistics and technical specifications: opportunites-digitales.com/avis-expressvpn/
NOTE: the above values are accurate as of 4.0.1 and may vary by release.
Next up, Drive Breakdown on Nutanix

Until next time, Rob….

NPP Training series – Data Structure on Nutanix with Hyper-V

To continue NPP training series here is my next topic: Data Structure on Nutanix with Hyper-V
If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V

To give credit due, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it and have updated the graphics for Hyper-V.

Data Structure on Nutanix

The NDFS (Nutanix Distributed Filesystem) is composed of the following high-level structs:

Storage Pool

  • Key Role: Group of physical devices
  • Description: A storage pool is a group of physical storage devices including PCIe SSD (Solid State Drive), SSD, and HDD (Hard Disk Drive) devices for the cluster.  The storage pool can span multiple Nutanix nodes and is expanded as the cluster scales.  In most configurations only a single storage pool is leveraged.

Container

  • Key Role: Group of VMs/files
  • Description: A container is a logical segmentation of the Storage Pool and contains a group of VM (Virtual Machine) or files (vDisks).  Some configuration options (eg. (RF) Resiliency Factor) are configured at the container level, however are applied at the individual VM/file level.  Containers typically have a 1 to 1 mapping with a datastore (SMB Share(s)).

vDisk

  • Key Role: vDisk
  • Description: A vDisk is any file over 512KB on NDFS including VM hard disks.  vDisks are composed of extents which are grouped and stored on disk as an extent group.

Below we show how these map between NDFS and the Hyper-V:
SP_structure Data Structure

Extent

  • Key Role: Logically contiguous data
  • Description: A extent is a 1MB piece of logically contiguous data which consists of n number of contiguous blocks (varies depending on guest OS block size). Extents are written/read/modified on a sub-extent basis (aka slice) for granularity and efficiency.  An extent’s slice may be trimmed when moving into the cache depending on the amount of data being read/cached.

Extent Group

  • Key Role: Physically contiguous stored data
  • Description: A extent group is a 1MB or 4MB piece of physically contiguous stored data.  This data is stored as a file on the storage device owned by the CVM (Controller Virtual Machine).  Extents are dynamically distributed among extent groups to provide data striping across nodes/disks to improve performance.

Below we show how these structs relate between the various filesystems:
NDFS_DataLayout_Text Data StructureHere is another graphical representation of how these units are logically related:
NDFS_DataStructure3 Data StructureNext up, I/O Path Overview

Until next time, Rob…

NPP Training series – Cluster Components with Hyper-V

To continue NPP training series here is my next topic: Cluster Components

If you missed other parts of my series, check out links below:
Part 1 – NPP Training series – Nutanix Terminology
Part 2 – NPP Training series – Nutanix Terminology
Cluster Architecture with Hyper-V

Data Structure on Nutanix with Hyper-V
I/O Path Overview

To give credit, most of the content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it.

Cluster Components

The Nutanix platform is composed of the following high-level components:

NDFS_Cluster Components

Cassandra

  • Key Role: Distributed metadata store
  • Description: Cassandra stores and manages all of the cluster metadata in a distributed ring like manner based upon a heavily modified Apache Cassandra.  The Paxos algorithm is utilized to enforce strict consistency.  This service runs on every node in the cluster.  Cassandra is accessed via an interface called Medusa.

Medusa

  • Key Role: Abstraction layer
  • Description: Medusa is the Nutanix abstraction layer that sits in front of the cluster’s distributed metadata database, which is managed by Cassandra..

Zookeeper

  • Key Role: Cluster configuration manager
  • Description: Zeus stores all of the cluster configuration including hosts, IPs, state, etc. and is based upon Apache Zookeeper.  This service runs on three nodes in the cluster, one of which is elected as a leader.  The leader receives all requests and forwards them to the peers.  If the leader fails to respond a new leader is automatically elected.   Zookeeper is accessed via an interface called Zeus.

Zeus

  • Key Role:  Library interface
  • Description: Zeus is the Nutanix library interface that all other components use to access the cluster configuration, such as IP addresses. Currently implemented using Zookeeper, Zeus is responsible for critical, cluster-wide data such as cluster configuration and leadership locks.

Stargate

  • Key Role: Data I/O manager
  • Description: Stargate is responsible for all data management and I/O operations and is the main interface from Hyper-V (via SMB 3.0).  This service runs on every node in the cluster in order to serve localized I/O.

Curator

  • Key Role: Map reduce cluster management and cleanup
  • Description: Curator is responsible for managing and distributing tasks throughout the cluster including disk balancing, proactive scrubbing, and many more items.  Curator runs on every node and is controlled by an elected Curator Master who is responsible for the task and job delegation.  There are two scan types for Curator, a full scan which occurs around every 6 hours and a partial scan which occurs every hour.

Prism

  • Key Role: UI and API
  • Description: Prism is the management gateway for component and administrators to configure and monitor the Nutanix cluster.  This includes Ncli, the HTML5 UI and REST API.  Prism runs on every node in the cluster and uses an elected leader like all components in the cluster.

prism1 Cluster Components prism2 Cluster Components

Genesis

  • Key Role: Cluster component & service manager
  • Description:  Genesis is a process which runs on each node and is responsible for any services interactions (start/stop/etc.) as well as for the initial configuration. Genesis is a process which runs independently of the cluster and does not require the cluster to be configured/running.  The only requirement for genesis to be running is that Zookeeper is up and running.  The cluster_init and cluster_status pages are displayed by the genesis process.

Chronos

  • Key Role: Job and Task scheduler
  • Description: Chronos is responsible for taking the jobs and tasks resulting from a Curator scan and scheduling/throttling tasks among nodes.  Chronos runs on every node and is controlled by an elected Chronos Master who is responsible for the task and job delegation and runs on the same node as the Curator Master.

Cerebro

  • Key Role: Replication/DR manager
  • Description: Cerebro is responsible for the replication and DR capabilities of DFS(Distributed Storage Fabric).  This includes the scheduling of snapshots, the replication to remote sites, and the site migration/failover.  Cerebro runs on every node in the Nutanix cluster and all nodes participate in replication to remote clusters/sites.

Pithos

  • Key Role: vDisk configuration manager
  • Description: Pithos is responsible for vDisk (DFS file) configuration data.  Pithos runs on every node and is built on top of Cassandra.

Next up, Data Structures which comprises high level structs for Nutanix Distributed Filesystem

Until next time, Rob….

NPP Training series – Cluster Architecture with Hyper-V

To continue NPP training series here is my next topic: Cluster Architecture

To give credit, some of this content was taken from Steve Poitras’s “Nutanix Bible” blog as his content is the most accurate and then I put a Hyper-V lean to it.

Cluster Architecture

The Nutanix solution is a converged storage + compute solution which leverages local components and creates a distributed platform for virtualization aka virtual computing platform. The solution is a bundled hardware + software appliance which houses 2 (6000/7000 series) or 4 nodes (1000/2000/3000/3050 series) in a 2U footprint. Each node runs an industry standard hypervisor (ESXi, KVM, Hyper-V currently) and the Nutanix Controller VM (CVM).  The Nutanix CVM is what runs the Nutanix software and serves all of the I/O operations for the hypervisor and all VMs running on that host.  For the Nutanix units running VMware vSphere, the SCSI controller, which manages the SSD and HDD devices, is directly passed to the CVM leveraging VM-Direct Path (Intel VT-d).  In the case of Hyper-V the storage devices are passed through to the CVM. Below is an example of what a typical node logically looks like:

NDFS_NodeDetail2 Cluster Architecture

Together, a group of Nutanix Nodes forms a distributed platform called the Distributed Storage Fabric (DFS).  DFS appears to the Hyper-V like any centralized storage array, however all of the I/Os are handled locally to provide the highest performance.  More detail on how these nodes form a distributed system can be found below. Below is an example of how these Nutanix nodes form NDFS and then presented up to Hyper-V via SMB 3.0 Share(s):

dsf_overview Cluster Architecture

DFS uses a software-defined, shared-nothing, scale-out approach to storage that eliminates the need for you to deploy a separate SAN along with its performance bottlenecks and scalability limitations. DFS leverages local SSD for fast VM performance and consolidates high capacity HDDs for cost-effective storage capacity.

The application data is intelligently placed in the appropriate storage tier, balancing storage performance and capacity needs. The environment’s noisy VMs on different hosts won’t impact the performance for any workloads—fulfilling key performance requirements for hybrid deployments.
Here are the key points with Hyper-V on Nutanix:

  • Hypervisor sees the Distributed Storage Fabric (DFS) as one or more SMB 3.0 file shares
  • Supports features like snapshots, dedupe, compression web-scale out, and disaster recovery
  • Locally shared storage is comprised of both flash and spinning disks
  • Variety of models (compute heavy, storage heavy, etc.)
  • Mix and match models within the same cluster
  • Pay as you grow – Start small and linearly scale your Microsoft infrastructure in minutes without the scalability shortcomings of traditional servers and storage.

Next up in the NPP Training series – Cluster Components