Monitoring Infrastructure Performance for SharePoint

17 Jul

In a decent sized SharePoint farm there are *many* moving pieces.   At least 5 servers, probably 10 nics, SAN’s, SQL, MOSS, IIS, Load balancers and Kerberos issues.  It can be difficult to figure out what is causing performance problems sometimes.

There are many good documents out there, especially the whitepaper on SharePoint Performance Optimization from MS IT, and simply familiarizing yourself with the common issues will help you identify many problems immediately.

That said, sometimes you can’t get your hands on the hardware and the problem isn’t obvious.  Assuming you’ve verified SharePoint configuration and scanned the ULS logs first, I typically ask an Admin to pull these WMI Counters for me while generating load — They are primarily on the SQL end, with the exception of checking NIC performance all around.  I focus on that first, because 75% of the time I find the disk-subsystem or the NIC/Network is to blame. 

Each one has a recommended range to look for.  These were pulled from a few MS documents.  Hope they help you in the future identifying an infrastructure problem.

 

Counters to monitor on SQL Server:

 

Logical Disk: Average Disk sec/Read (Read Latency).

This counter indicates the time it takes the disk to retrieve data. On well-tuned I/O subsystems, ideal values are 1-5 ms for disk containing sql logs, and 4-20 ms for disks containing data (ideally below 10 ms).

Logical Disk: Average Disk sec/Write (Write Latency).

This counter indicates the time it takes the disk to write the data. On well-tuned I/O subsystems, ideal values would be 1-5 ms for log, and 4-20 ms for data (ideally below 10 ms).

Logical Disk: Current Disk Queue Length.

For this counter, lower values are better. Values above 20 may indicate a bottleneck

 

On Web Front ends and SQL:

 

Processor: % Processor Time: _Total.   

This counter should be kept between 50 percent and 75 percent.

System: Processor Queue Length: (N/A).

This should be below two times the number of CPU’s.

Memory: Pages/sec: (N/A).

Monitor this counter to ensure that it remains below 100.

Network: Network Queue Length counter.

Monitor under load, and watch for spikes.

Leave a Reply

Your email address will not be published. Required fields are marked *