Comprehensive guide for applying best practices to get the optimal performance for Microsoft SQL deployment on VMware vSphere

 


It's important to ensure that your vSphere environment is properly set to host Microsoft SQL Server as it is one of the most widely deployed database platforms in the world, with many organizations having many instances deployed in their environments, so I thought it’s important to write an article to help you to apply best practices to get the best performance.

 

First and most important, you must have a full understanding of the hardware layout; with the advent of multi-core and multi-node and non-uniform memory access (NUMA) systems, understanding relationships among cores, logical CPUs, and physical CPUs will help to optimize your environment and to have the best performance; So I recommend you read my another article in Arabic for this subject: https://www.yazeed2.com/2021/07/VM%20best%20Practice.html
In case you want to read an official version from VMware, please visit this link: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf


1) Choosing the editions of SQL Server deployment:

Ensure not to exceed the allowed configurations per SQL edition as below:

  • SQL Server Express Edition: Limited to a lesser of 1 socket or 4 cores
  • SQL Server Standard Edition: Limited to a lesser of 4 sockets or 24 cores; It includes the SQL Server Agent and Basic Availability Groups, (Basic Availability Groups are limited to only 2 nodes)
  • SQL Server Enterprise Edition: it removes the CPU and RAM constraints of Express and Standard; It is limited only by operating system limits, and it includes a more robust set of HADR options (like Availability Groups with more than 2 nodes) along with several features designed to improve uptime such as online index rebuilds

( Compute capacity limits by edition of SQL Server: https://learn.microsoft.com/en-us/sql/sql-server/compute-capacity-limits-by-edition-of-sql-server?view=sql-server-ver16 )

 

It’s recommended to use “Always-On Availability Group” feature for high availability as it does not require clustered disks between VMs; For “Failover Cluster” it has limitations and requires shared disk on the VM level, which means you have a single point of failure from storage side.

Note: Microsoft SQL Server Always-On Availability Group (AG) or Microsoft Exchange Database Availability Groups (DAG); Each Availability Group replica has its own copy of the database, which removes the single point of failure from storage but adds additional storage space requirements for each AG replica. VMware availability features, such as vSphere vMotion, vSphere HA, and vSphere DRS, are fully compatible with Always-On (AG), and with taking that in mind, vSphere HA is not a replacement for Always-On (AG), but it enhances its SLA by protecting the availability groups.


2) Physical Host Configuration Level:

It is better to have a dedicated cluster for SQL Server workloads due to the workload patterns in SQL Server; If a dedicated cluster is no option, try to have dedicated hosts and isolate VMs running SQL Server onto these hosts and ensure other workloads are not running on them.

Ensure the following configurations are enabled on the BIOS of the physical host:

  • Turbo Boost - Enabled
  • Hyper-threading - Enabled
  • NUMA - Enabled
  • VT-x/VT-d or similar function to enhance virtualization - Enabled
  • Power Management - Set to High Power
  • Disable all Power Management C-States (including C1E state) or any efficiency management function

Consult your hardware vendor if you need any assistance in the above configurations.

 

3) ESXi Configuration Level:

 

4) VM Configuration Level:

Modify the VM CPU, Memory, and Disk configurations during the VM provision

PVSCI Adapter

Virtual Disk Workload

0

System Databases

1

Data files - mdf/ndf

2

Transaction log files - ldf

3

TempDB - both data and log

        • If more disks are needed, spread the workload out evenly across the PVSCSI adapters based on the I/O demands of the database files that will be on those disks
          • Monitor file latency, I/Os, and queuing over time to measure, and if there is disk contention, consider rebalancing the workload across the 4 PVSCSI adaptors
            • Measure latency and I/Os on SQL server file latency
            • Measure ESXi queue management

            • Another optional configuration - Separating files that have different access patterns will help streamline I/O and optimize performance

Virtual Hardware

Volume Letter

Volume Label

Purpose

Virtual Hard disk 1

C:\

OS

Operating System

Virtual Hard disk 2

Z:\

Paging

Paging

Virtual Hard disk 3

D:\

Data

Databases

Virtual Hard disk 4

T:\

TempDB

TempDB

Virtual Hard disk 5

I:\

Indexes

Indexes

Virtual Hard disk 6

L:\

Logs

Transaction Logs

Virtual Hard disk 7

Q:\

Quorum

Used only for Availability Group enabled SQL servers

Virtual Hard disk 8

B:\

Backup

Backups, batch processing, full text catalogs

            • Place SQL Server data (system and user), transaction log, and backup files on separate datastores
            • For underlying storage, RAID 10 can provide the best performance and availability for user data, transaction log files, and TempDB
            • The aggregate IOPS demands of all VMs on the VMFS should not exceed the IOPS capability of the underlying physical disks

 

5) Windows Configuration Level:

  • Power Plan - Set to High Performance
    (Slow performance on Windows Server when using the Balanced power plan: https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/slow-performance-when-using-power-plan )
  • Enable the TCP/IP protocol - To allow communication with client systems
    ( Enable or disable a server network protocol: https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/enable-or-disable-a-server-network-protocol?view=sql-server-ver16 )
  • Page file size - configure as a fixed 8GB allocation
  • Windows Updates - before installation of SQL Server, ensure all updates are applied
  • Instant file initialization - Configure Local Security Policy to “Perform Volume Management” for the SQL Server Service account
  • Lock pages in memory - Configure Local Security Policy to “Lock Pages In Memory” for the SQL Server Service account
  • Volume format - format all SQL Server volumes with 64KB NTFS Allocation Units
  • Block Alignment - Partitions have been volume aligned by default, but ensure it’s appropriately configured
  • SQL Server folder structure - create the following folder structure for SQL Server; This structure is helpful with the granularity of snapshots; Additional volumes exposed as mount points can be attached if needed, and each volume must be on a separate virtual disk
    • C:\OS – do not store any files on the OS volume despite their small size
    • D:\DATA
    • L:\LOGS
    • S:\SYSTEM
    • T:\TempDB
  • Anti-Virus - ensure your AV system excludes MDF, NDF, LDF, SDF, and BAK

 

6) SQL Server Instance Configuration Level:

  • Max Memory - Tier 1 workloads should have MIN and MAX values:
    • Set the maximum amount of memory that the SQL instance has at its disposal. This setting is generally used to define boundaries for different instances and applications so that the overall memory of the server is not consumed, negatively affecting the performance of databases, applications, and even the operating system of the server
    • Sets the minimum amount of memory that the SQL instance has at its disposal. Set this too high, and the operating system won’t have enough resources to do background processing for anything other than SQL which can cause performance issues
    • If several SQL instances exist on the server, they’ll all be competing for the same resources. Setting the Max Server Memory option for each instance ensures that the instances continue to operate at peak performance. A good rule of thumb is to reserve 4-6GB of memory for the operating system. In an environment where the SQL server has 16GB of total memory assigned, assign no more than 10-12GB of memory for the SQL instances
      ( Server memory configuration options: https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/server-memory-server-configuration-options?view=sql-server-ver15 )


  • MAXDOP <= Maximum Number of Cores in a NUMA Node or <= 8
    By default, SQL Server will use all available vCPUs during query execution. That’s great for large queries, but it can cause performance problems and limit concurrency. Appropriately configuring MAXDoP depends on the SQL Server machine. There are three factors to look at when determining MAXDoP:
    • Asymmetric Multiprocessing (SMP)
    • Non-Uniform Memory Access (NUMA and vNUMA)
    • Hyper-Threading

Microsoft recommends the following when configuring MAXDoP, as it’s an advanced configuration option:

Server Configuration

Number of Processors

Guidance

Server with single NUMA node

Less than or equal to 8 logical processors

Keep MAXDOP at or below # of logical processors

Server with single NUMA node

Greater than 8 logical processors

Keep MAXDOP at 8

Server with multiple NUMA nodes

Less than or equal to 16 logical processors per NUMA node

Keep MAXDOP at or below # of logical processors per NUMA node

Server with multiple NUMA nodes

Greater than 16 logical processors per NUMA node

Keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16

( Configure the max degree of parallelism (server configuration option): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-degree-of-parallelism-server-configuration-option?view=sql-server-ver15 )

 

7) Database Maintenance and Configuration Level:

  • Backups - Use a layered approach combining array-based snapshots and SQL Server native backups
    • Virtual Machine snapshots or backups are NOT database backups; snapshot usage should be limited and possibly avoided on production SQL Server
    • Do not use a single snapshot for more than 3 days, and consider planning the snapshots operations for non-peak hours as the creation or removal of a snapshot may take a long time and can potentially cause performance issues
    • Check the status of the Volume Shadow Copy Service (VSS) on the OS before taking a snapshot, and use the “Quiesce guest file system” option to ensure that a disk-consistent snapshot will be taken
    • Don’t take a snapshot if VMware Tools is not installed or not functional, and don’t use the “Snapshot the virtual machine’s memory” option as this may stun the VM
    • SAN snapshots are NOT database backups
    • Always use native SQL backups (full, differential, log) by 3rd party applications that are database aware
  • Index and Statistics Maintenance - Consider staggering maintenance jobs and use an intelligent maintenance framework
  • Data File Layout - For large and demanding databases, spread data out into file groups and data files that map back to the disk topology; Add additional disks if needed for the workload.
  • TempDB - Create one TempDB file per core up to 8 cores
    • Trace Flag 1118 – Full Extents Only: Microsoft advises that Trace Flag 1118 can help to reduce allocation contention in TempDB. Trace Flag 1118 instructs SQL Server to avoid mixed extents and use full extents; when Trace Flag 1118 is set, each newly allocated object in every database on that instance will get a private 64KB of data. Since TempDB is typically where most objects are created, it makes the most difference here.
    • Trace Flag 1117 – Grow All Files in a FileGroup Equally: Microsoft advises that Trace Flag 1117 can be helpful for TempDB, which is commonly configured with multiple data files. Trace Flag 1117 will change the behavior of file growth. If set, when one data file in a FileGroup grows, it will force other files in that FileGroup to grow as well. Trace Flag 1117 is recommended in the Fast Track Architecture Guide from Microsoft
      Whenever possible, avoid using Trace Flags, because it leaves less room for edge conditions
      ( Recommendations to reduce allocation contention in SQL Server TempDB database: https://learn.microsoft.com/en-US/troubleshoot/sql/database-engine/performance/recommendations-reduce-allocation-contention )

 

8) Optional tips:

  • Monitoring - Consider investing in SQL Server specific monitoring to capture workload and system metrics, like: VMware Aria Operations Enterprise
  • Enable Query Store - Consider investing in a tool for an effective way to extract performance data to find performance issues
  • Consider investing in automation and configuration management tool

 

References:

Comments

Popular posts from this blog

نظرة عامة للـ NSX-T

المرجع الشامل لأفضل الممارسات العالمية على مستوى الـ VM