Comprehensive guide for applying best practices to get the optimal performance for Microsoft SQL deployment on VMware vSphere
It's important to ensure that your vSphere environment is properly set to host Microsoft SQL Server as it is one of the most widely deployed database platforms in the world, with many organizations having many instances deployed in their environments, so I thought it’s important to write an article to help you to apply best practices to get the best performance.
First and most important, you must have a full understanding
of the hardware layout; with the advent of multi-core and multi-node and
non-uniform memory access (NUMA) systems, understanding relationships among
cores, logical CPUs, and physical CPUs will help to optimize your environment
and to have the best performance; So I recommend you read my another article in Arabic for this subject: https://www.yazeed2.com/2021/07/VM%20best%20Practice.html
In case you want to read an official version from VMware, please visit this
link: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf
1) Choosing the editions of SQL Server deployment:
Ensure not to exceed the allowed configurations per SQL edition as below:
- SQL Server Express Edition: Limited to a lesser of 1 socket or 4 cores
- SQL Server Standard Edition: Limited to a lesser of 4 sockets or 24 cores; It includes the SQL Server Agent and Basic Availability Groups, (Basic Availability Groups are limited to only 2 nodes)
- SQL Server Enterprise Edition: it removes the CPU and RAM constraints of Express and Standard; It is limited only by operating system limits, and it includes a more robust set of HADR options (like Availability Groups with more than 2 nodes) along with several features designed to improve uptime such as online index rebuilds
( Compute capacity limits by edition of SQL Server: https://learn.microsoft.com/en-us/sql/sql-server/compute-capacity-limits-by-edition-of-sql-server?view=sql-server-ver16 )
It’s recommended to use “Always-On Availability Group” feature for high availability as it does not require clustered disks between VMs; For “Failover Cluster” it has limitations and requires shared disk on the VM level, which means you have a single point of failure from storage side.
Note: Microsoft SQL Server Always-On Availability Group (AG) or Microsoft Exchange Database Availability Groups (DAG); Each Availability Group replica has its own copy of the database, which removes the single point of failure from storage but adds additional storage space requirements for each AG replica. VMware availability features, such as vSphere vMotion, vSphere HA, and vSphere DRS, are fully compatible with Always-On (AG), and with taking that in mind, vSphere HA is not a replacement for Always-On (AG), but it enhances its SLA by protecting the availability groups.
2) Physical Host Configuration Level:
It is better to have a dedicated cluster for SQL Server workloads due to the workload patterns in SQL Server; If a dedicated cluster is no option, try to have dedicated hosts and isolate VMs running SQL Server onto these hosts and ensure other workloads are not running on them.
Ensure the following configurations are enabled on the BIOS of the physical host:
- Turbo Boost - Enabled
- Hyper-threading - Enabled
- NUMA - Enabled
- VT-x/VT-d or similar function to enhance virtualization - Enabled
- Power Management - Set to High Power
- Disable all Power Management C-States (including C1E state) or any efficiency management function
Consult your hardware vendor if you need any assistance in the above configurations.
3) ESXi Configuration Level:
- Power Management
     - Set to High Performance
 ( Host Power Management Policies: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-4D1A6F4A-8C99-47C1-A8E6-EF3865603F5B.html )
 
 
- DRS - Ensure that you
     understand the
     workload patterns,
     and consider
     scheduling DRS for off-peak hours to avoid performance
     penalties while moving a VM
 ( Distributed Resource Scheduler: https://www.vmware.com/sa/products/vsphere/drs-dpm.html )
4) VM Configuration Level:
Modify the VM CPU, Memory, and Disk configurations during the VM provision
- vCPU Allocation
- Pay attention to pCPUs:vCPUs ratio on your host
- Best practices are defined as follows:
- High-performance systems – total vCPU allocation not to exceed total pCPUs available in the cluster
- Lower tier systems
       – it’s ok to over-allocate vCPUs to pCPUs, and monitor
       for workload impact
 
 
- Cores per socket
- Always match the underlying hardware NUMA topology; For example, if you have a host with 2 sockets and 10 physical cores, don’t create your VMs with 4 sockets and 12 cores.
- SQL
      Server Standard edition can only use the lesser of 4 sockets and 24 cores,
      and it is
      not NUMA aware;
      It’s recommended to configure the VM with the allowed SQL configuration
      as per Microsoft guidelines.
 ( Compute capacity limits by edition of SQL Server: https://learn.microsoft.com/en-us/sql/sql-server/compute-capacity-limits-by-edition-of-sql-server?view=sql-server-ver16 )
- Once the server is up,
      check the NUMA topology in SQL Server in Server Property.
 ( View or change server properties (SQL Server): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/view-or-change-server-properties-sql-server?view=sql-server-ver16 )
 
 
- CPU Hot Plug - Disabled
- This can have a significant
      performance impact.
 ( Disable CPU Hot Add: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-285BB774-CE69-4477-9011-598FEF1E9ACB.html )
 
 
- Memory Configuration
- Avoid over-allocating memory ((Host Memory >= Sum of (VMs memory + overhead), and do not exceed the capacity of the physical host
- Consider setting memory
      reservations with
      caution for critical workloads to prevent ballooning; SQL Server does not
      behave well when the amount of memory assigned to the operating system
      changes; Do not use Dynamic Memory on a SQL Server at all.
 ( Allocate Memory Resources: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-vm-administration/GUID-49D7217C-DB6C-41A6-86B3-7AFEB8BF575F.html )
 
 
- Disk Configuration
- Add
      4 PVSCSI disk adapters to the VM
 ( Configuring disks to use VMware Paravirtual SCSI (PVSCSI) controllers: https://kb.vmware.com/s/article/1010398 )
- Leave
      the primary
      virtual disk as LSI SAS to use it for the OS partition 
 ( Virtual Machine Controller Configuration in the VMware Host Client: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-esxi-host-client/GUID-74E59205-E3CB-4D17-8F52-7E131916014D.html )
- It’s recommended to use VMDK
- Provision them
        as Eager Zero Thick
 ( Changing the thick or thin provisioning of a virtual disk: https://kb.vmware.com/s/article/2014832 )
- At provisioning time, scale the disk size to that needed for the VM’s workload
- Take into account VMDK and Datastore size limits
- For high-end workloads, use a 1:1 mapping of Data file to VMDK to Datastore
- Create and attach four additional Volumes
- Dedicate disks to your types of data: database files, transaction log files, system databases, and TempDB; If you’re doing backups to a local disk, add an additional dedicated disk for that too.
- Attach one disk to each PVSCSI adapter
- Example Virtual Disk to PVSCSI layout for an OLTP workload
| PVSCI Adapter | Virtual Disk Workload | 
| 
 | System Databases | 
| 
 | Data files - mdf/ndf | 
| 
 | Transaction log files - ldf | 
| 
 | TempDB - both data and log | 
- If more disks are needed, spread the workload out evenly across the PVSCSI adapters based on the I/O demands of the database files that will be on those disks
- Monitor file latency, I/Os, and queuing over time to measure, and if there is disk contention, consider rebalancing the workload across the 4 PVSCSI adaptors
- Measure latency and I/Os on SQL server file latency
- Measure ESXi queue
          management
 
 
- Another optional configuration - Separating files that have different access patterns will help streamline I/O and optimize performance
| Virtual Hardware | Volume Letter | Volume Label | Purpose | 
| Virtual Hard disk 1 | C:\ | OS | Operating System | 
| Virtual Hard disk 2 | Z:\ | Paging | Paging | 
| Virtual Hard disk 3 | D:\ | Data | Databases | 
| Virtual Hard disk 4 | T:\ | TempDB | TempDB | 
| Virtual Hard disk 5 | I:\ | Indexes | Indexes | 
| Virtual Hard disk 6 | L:\ | Logs | Transaction Logs | 
| Virtual Hard disk 7 | Q:\ | Quorum | Used only for Availability Group enabled SQL servers | 
| Virtual Hard disk 8 | B:\ | Backup | Backups, batch processing, full text catalogs | 
- Place SQL Server data (system and user), transaction log, and backup files on separate datastores
- For underlying storage, RAID 10 can provide the best performance and availability for user data, transaction log files, and TempDB
- The aggregate IOPS demands of all VMs on the VMFS should not
          exceed the IOPS capability of the underlying physical disks 
5) Windows Configuration Level:
- Power Plan - Set to High Performance
 (Slow performance on Windows Server when using the Balanced power plan: https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/slow-performance-when-using-power-plan )
- Enable the TCP/IP protocol - To allow communication
     with client systems
 ( Enable or disable a server network protocol: https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/enable-or-disable-a-server-network-protocol?view=sql-server-ver16 )
- Page file size - configure as a fixed 8GB allocation
- Windows Updates - before installation of SQL Server, ensure all updates are applied
- Instant file initialization - Configure Local Security Policy to “Perform Volume Management” for the SQL Server Service account
- Lock pages in memory - Configure Local Security Policy to “Lock Pages In Memory” for the SQL Server Service account
- Volume format - format all SQL Server volumes with 64KB NTFS Allocation Units
- Block Alignment - Partitions have been volume aligned by default, but ensure it’s appropriately configured
- SQL Server folder structure - create the following folder structure for SQL Server; This structure is helpful with the granularity of snapshots; Additional volumes exposed as mount points can be attached if needed, and each volume must be on a separate virtual disk
- C:\OS – do not store any files on the OS volume despite their small size
- D:\DATA
- L:\LOGS
- S:\SYSTEM
- T:\TempDB
- Anti-Virus - ensure your AV system excludes MDF, NDF, LDF, SDF, and BAK
6) SQL Server Instance Configuration Level:
- Max Memory - Tier 1 workloads should have MIN and MAX values:
- Set the maximum amount of memory that the SQL instance has at its disposal. This setting is generally used to define boundaries for different instances and applications so that the overall memory of the server is not consumed, negatively affecting the performance of databases, applications, and even the operating system of the server
- Sets the minimum amount of memory that the SQL instance has at its disposal. Set this too high, and the operating system won’t have enough resources to do background processing for anything other than SQL which can cause performance issues
- If several SQL instances exist on the server, they’ll
      all be competing for the same resources. Setting the Max Server Memory
      option for each instance ensures that the instances continue to operate
      at peak performance. A good rule of thumb is to reserve 4-6GB of memory
      for the operating system. In an environment where the SQL server has 16GB
      of total memory assigned, assign no more than 10-12GB of memory for the
      SQL instances
 ( Server memory configuration options: https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/server-memory-server-configuration-options?view=sql-server-ver15 )
 
 
- MAXDOP <= Maximum Number of Cores
     in a NUMA Node or <= 8
 By default, SQL Server will use all available vCPUs during query execution. That’s great for large queries, but it can cause performance problems and limit concurrency. Appropriately configuring MAXDoP depends on the SQL Server machine. There are three factors to look at when determining MAXDoP:
- Asymmetric Multiprocessing (SMP)
- Non-Uniform Memory Access (NUMA and vNUMA)
- Hyper-Threading
Microsoft recommends
the following when configuring MAXDoP, as it’s an advanced configuration
option:
| Server Configuration | Number of Processors | Guidance | 
| Server with single NUMA node | Less than or equal to 8 logical processors | Keep MAXDOP at or below # of logical processors | 
| Server with single NUMA node | Greater than 8 logical processors | Keep MAXDOP at 8 | 
| Server with multiple NUMA nodes | Less than or equal to 16 logical processors per NUMA node | Keep MAXDOP at or below # of logical processors per NUMA node | 
| Server with multiple NUMA nodes | Greater than 16 logical processors per NUMA node | Keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16 | 
( Configure the max degree of parallelism (server configuration option): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-degree-of-parallelism-server-configuration-option?view=sql-server-ver15 )
- CTFP = 50 (start at 50 then measure
     again once you have workload running)
 ( Configure the cost threshold for parallelism (server configuration option)): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-cost-threshold-for-parallelism-server-configuration-option?view=sql-server-ver16 )
 
 
- Enable Optimize for Ad-hoc Workloads
 ( Optimize for ad hoc workloads (server configuration option)): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/optimize-for-ad-hoc-workloads-server-configuration-option?view=sql-server-ver16 )
 
 
- Max Worker Threads
     - Setting this
     option will optimize performance when a large number of clients are
     connected to SQL Server
 Max Worker Threads is an advanced option and so should not be altered without approval from an experienced database administrator
 ( Configure the max worker threads (server configuration option): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option?view=sql-server-ver15#:~:text=To%20configure%20the%20max%20worker%20threads%20option,value%20from%20128%20through%2032%2C767 )
 
 
- Index Create Memory Option
     - Setting this option controls the maximum amount of memory that is
     initially allocated for creating indexes.
 Index Create Memory is an advanced option and so should not be altered without approval from an experienced database administrator
 ( Configure the index create memory (server configuration option): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-index-create-memory-server-configuration-option?view=sql-server-ver15#:~:text=The%20index%20create%20memory%20option%20controls%20the%20maximum%20amount%20of,0%20(self%2Dconfiguring) )
 
 
- Min Memory Per Query
     - By default,
     the Min Memory Per Query option allocates >=1024 KB for each query run.
     Microsoft's best practice is to leave this setting at the default value of
     0, which allows SQL Server to dynamically manage the amount of memory
     allocated for index creation operations
 If SQL Server has more memory than it needs to run efficiently, the performance of some queries could be boosted if you increase this setting which can improve overall performance. If there is no free memory available, increasing this setting will decrease overall performance rather than increase it
 ( Configure the min memory per query (server configuration option): https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-min-memory-per-query-server-configuration-option?view=sql-server-ver15#:~:text=The%20min%20memory%20per%20query%20option%20specifies%20the%20minimum%20amount,least%20that%20much%20total%20memory )
7) Database Maintenance and Configuration Level:
- Backups - Use a layered approach combining array-based snapshots and SQL Server native backups
- Virtual Machine snapshots or backups are NOT database backups; snapshot usage should be limited and possibly avoided on production SQL Server
- Do not use a single snapshot for more than 3 days, and consider planning the snapshots operations for non-peak hours as the creation or removal of a snapshot may take a long time and can potentially cause performance issues
- Check the status of the Volume Shadow Copy Service (VSS) on the OS before taking a snapshot, and use the “Quiesce guest file system” option to ensure that a disk-consistent snapshot will be taken
- Don’t take a snapshot if VMware Tools is not installed or not functional, and don’t use the “Snapshot the virtual machine’s memory” option as this may stun the VM
- SAN snapshots are NOT database backups
- Always use native SQL backups (full, differential, log) by 3rd party applications that are database aware
- Index and Statistics Maintenance - Consider staggering maintenance jobs and use an intelligent maintenance framework
- Data File Layout - For large and demanding databases, spread data out into file groups and data files that map back to the disk topology; Add additional disks if needed for the workload.
- TempDB - Create one TempDB file per core up to 8 cores
- Trace Flag 1118 – Full Extents Only: Microsoft advises that Trace Flag 1118 can help to reduce allocation contention in TempDB. Trace Flag 1118 instructs SQL Server to avoid mixed extents and use full extents; when Trace Flag 1118 is set, each newly allocated object in every database on that instance will get a private 64KB of data. Since TempDB is typically where most objects are created, it makes the most difference here.
- Trace
      Flag 1117 – Grow All Files in a FileGroup Equally: Microsoft advises
      that Trace Flag 1117 can be helpful for TempDB, which is commonly
      configured with multiple data files. Trace Flag 1117 will change the
      behavior of file growth. If set, when one data file in a FileGroup grows,
      it will force other files in that FileGroup to grow as well. Trace Flag
      1117 is recommended in the Fast Track Architecture Guide from Microsoft
 Whenever possible, avoid using Trace Flags, because it leaves less room for edge conditions
 ( Recommendations to reduce allocation contention in SQL Server TempDB database: https://learn.microsoft.com/en-US/troubleshoot/sql/database-engine/performance/recommendations-reduce-allocation-contention )
8) Optional tips:
- Monitoring - Consider investing in SQL Server specific monitoring to capture workload and system metrics, like: VMware Aria Operations Enterprise
- Enable Query Store - Consider investing in a tool for an effective way to extract performance data to find performance issues
- Consider investing in automation and configuration management tool
References:
- ( Architecting Microsoft SQL Server on VMware vSphere Best Practices Guide: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf ) Please focus on the section from page 67 to page 76; it is important to understand it.
- ( Microsoft SQL Server 2019 on VMware vSphere 7: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere7-sql-server-perf.pdf )
- (
     Performance Characterization of Microsoft SQL Server on VMware vSphere: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/sql-server-vsphere65-perf.pdf
     )
 
 Have a nice day

 
 
 
Comments
Post a Comment