Secret Server Database Maintenance

This topic only applies to Secret Server On-Premises.

Purpose

This topic, Secret Server Database Maintenance, is intended to share, at a high-level, information around best practices and recommended practices as it pertains to the Delinea product database hosted in Microsoft's SQL Server. The primary focus will be around SQL Server hosted on a physical or virtual machine.

However, a customer can choose under specific architectures to host the database within cloud-based SQL Server Platform-as-a-Service (PaaS) offerings such as Azure SQL DB or AWS RDS.

Customers and partners may use this document for reference with any version of SQL Server. Where there are differences in edition or limitations within PaaS services will be noted.

Implementation of database maintenance is not within the scope of Delinea Support.

Please note references or links provided to third-party or open-source tooling is not a direct endorsement by Delinea.

Intended Audience

SQL Server is a vast product with multiple components and configurations that Database Administrators can spend years to understand fully. This document is intended as a guide to get started; it is not a one-stop-shop for all the answers. Any job role can utilize this document as a reference for Delinea application databases.

Cloud Providers

The following providers will be considered and noted in this document:

Azure: Azure SQL DB
AWS: SQL Server on AWS RDS
GCP: Cloud SQL for SQL Server

Azure SQL includes SQL Server services on Azure VMs and Azure SQL Managed Instance (MI). This document considers those services to be in parity with SQL Server running on-premises in the area of database maintenance.

Cloud provider information is subject to change by the specific provider and is out of the control of Delinea. This document will be maintained for significant changes, but advise consulting the provider's documentation to verify updates to those services.

Required Maintenance

SQL Server can be installed without much difficulty. Going through the wizard, you can do "next, next, next" and have SQL Server running on any server or desktop (even IoT devices). However, the wizard installation does not set up maintenance on the application databases you add. The application may work well initially; it starts with a small amount of data in the database. As your user base or usage grows, so will the database.

Ensuring proper maintenance is set up for the application database(s) is essential for the application and the business. As the enterprise expands the use of Delinea products, the hosted database expands and grows with data. Proper maintenance can ensure as the amount of data grows, the database is maintained correctly for the best performance possible.

The following sections will be the focus of this document. Links will be provided as needed to given additional resources to expand or research further with a given feature or concept of SQL Server.

Backups

If the application database is "lost" for whatever reason, how much data can you lose? This question needs to be answered by your backup plan and determines how you configure it within SQL Server. Backups in SQL Server serve the role of recoverability but also with database space management.

This section will go over the available backup types and how they offer recoverability. Additionally, some tools and options for automating those backups will be provided as examples.

Database Integrity

One of the most primary causes of data loss is hardware failures. Some hardware failures can cause SQL Server databases to become corrupt and unrecoverable without a backup being restored.

Detecting these scenarios will ensure your application and business continue operating or can recover safely.

An important note is that some database corruption can follow your database backups, so it is crucial to perform.

Indexes and Statistics

Users are starting their day, and the help desk begins to get calls that the application is not responding, or pages are taking ages to load. On average, the performance issues with application databases come down to two objects: indexes and statistics. The query optimizer in SQL Server is the piece that ensures the queries from the application run as efficiently as possible. A central part of that work is the indexes and statistics of the tables referenced in the query. Ensuring proper maintenance is performed regularly will help the query optimizer with its goal of efficient query processing.

Database File Size

The files of a database themselves do not require direct maintenance. This section focuses on the configuration of database files and how some maintenance tasks mentioned previously can help you manage the physical file size.

This will not apply to any PaaS offering of SQL Server; the cloud-provider maintains the files and underlying storage.

The Database

This document is focused on maintenance for Delinea database(s), but before we get to that, we need to get on the same page with what makes up a database in SQL Server. At a high overview, databases contain the following:

Collection of tables
The tables have rows that include various columns
Those columns store certain types of data: dates, strings, binary, numbers, etc.
The data is persisted to database files on a file system

While data is persisted to the database files, it will exist within memory when applications reference and manipulate it. SQL Server manages that memory and how/when that data gets persisted to those database files.

Additional objects you find in a database that are directly associated with the tables for performance are the following items:

Indexes
Statistics (index and column)

Physical Files

There are two types of database files with SQL Server:

Data files (*.mdf or *.ndf)
Log files (*.ldf)

As mentioned previously, mechanisms within SQL Server handle persisting transactions that have written new data or modified current data into the data file. The log file stores a history of those transactions until the change has been persisted in the data file. If SQL Server crashes, that log file's job is to help get the database back to a consistent state.

Recovery Model

A central area to understand with database maintenance is the recovery model of the database. The recovery model will play a role in how SQL Server manages transactions being written between the log file (history of transactions) and the database's data file. Transactions being the meat of an application database, it plays a significant role. For example, the backups occur within the context of the recovery model. A database can only be set to one of three recovery models:

Full
Simple
Bulk-logged

The default recovery model in SQL Server for new user databases is Full. Any new database that is created will be set to full recovery. There are possible model Database (system database in SQL Server) configurations that can change this. The below sections provide a bulleted list of points to be aware of with each recovery model type and how the log file space is managed.

Full

Can recover to a specific point in time (minimal data loss)
Log backups are required
All transactions are fully logged
Manage log space via backup and maintenance

Simple

Recovery only to the end of the last backup (data loss greater)
Log backups are not allowed
Some transactions will not be fully logged
Automatic log space management, handled by SQL Server
Many HA features do not support use with this model (Log shipping, Availability Groups, etc.)

Bulk Logged

Recovery to the end of the last backup (point-in-time recovery not supported)
Log backups are required
Bulk operations are minimally logged
Manage log space via backup and maintenance

Additional details on SQL Server recovery modules can be found at Recovery models (SQL Server).

Cloud Providers

The database's physical files and the recovery model setting are areas of database management that the cloud provider maintains and controls. The databases are operating under the full recovery model for PaaS offerings of SQL Server.

Backups

Backups are always a best practice, even when you do not think you will need to restore from them. The unknown, after all, is unknown, and backups can always help plan around the unknown. Three types of backups can be taken in SQL Server.

Backup Types

Full

A full backup contains the full extent of data within the database and the transaction log to recover the database to a consistent state. This backup is required as a minimum before the other backup types can be done. During recovery, the full backup is the first backup type that has to be applied.

Differential

The differential backup contains all changes in the database since the last full backup. As databases grow, the time required for a full backup may extend, and it may not be reasonable to frequently take a full backup. A differential can be used to continue the frequency within a shorter time window. Additional differential backups will grow based on data changes since the last full that was taken.

Log

This backup contains changes logged in the database transaction log file that is not persisted in the database files. The size of these backups will vary based on the activity that has occurred in the database.

Recoverability

A backup strategy or schema should focus on the need for recoverability. A backup's primary purpose is to recover data. Delinea's database(s) recovery needs are purely based on the business requirements of a customer. A disaster recovery plan should account for and determine the backup scheme chosen based on those requirements.

Example

A retail company utilizes Secret Server and has labeled it a business-critical system. The product is only used by their users during regular operating hours of 7:00 AM to 5:00 PM Monday through Friday.

The business requirements for databases with systems labeled as business-critical have an RPO (recovery point objective) of 10 minutes. An example database configuration and backup scheme recommended in this situation that would meet the RPO would be as follows:

The database recovery model is set to full.
Backups are taken:
- Full: weekly, every Monday @ 4:00 AM
- Differential: twice daily, Monday - Friday @ 5:00 AM and 1:00 PM
- Log: daily, Monday - Friday every 10 minutes between 5:00 AM - 6:00 PM

On Wednesday at 3:45 PM, the disk hosting the database server for Secret Server goes offline. When the disk and server are brought back online, the database is corrupt and cannot be brought online. The following backup files are needed to meet the RPO of recovering the data within 10 minutes of the time the database went offline at 3:45 PM:

Full backup from Monday
Differential backup from 1:00 PM on Wednesday
All log backups from Wednesday at 1:00 PM to 3:40 PM

The above recovery meets the RPO as there is only a data loss potential of 5 minutes.

Backup Automation

Backup Types

Delinea products, such as Secret Server, offer the ability to schedule full backups daily. That option can be used if the recovery limitations around that schedule and backup type meet your business need. When you need to get more granular with the backup type and schedule, SQL Server does offer two options:

SQL Server Maintenance Plans

These offer a "quick-and-dirty" method of getting backup jobs in place
It can be difficult at times to troubleshoot issues with them.
They do not offer much control; as databases are added or removed, you have to adjust the backup task manually.

SQL Agent Jobs

These offer a more robust method
Does require scripts be written to perform the backup
Some tooling is available with prewritten scripts (see more below)

Ola Hallengren's Backup Script

This script is a standard tool used by many in the SQL Server community. You can download the script and find full details at SQL Server Backup. The stored procedure offered allows you extreme granular control over the backup type, what flags are used in the backup command, cleaning up old backup files, and stripping backups for performance improvements.

The scripts provided below can be used in a SQL Agent Job to be scheduled on the frequency required to meet your recovery needs.

Full Backup

EXECUTE dbo.DatabaseBackup @Databases = 'SecretServer',

@Directory = '\\backups\SQLServer\dbserver1', @BackupType = 'FULL',

@Verify = 'Y', @Compress = 'Y', @CheckSum = 'Y', @CleanupTime = 504,

@CleanupMode = 'AFTER_BACKUP'

The above command will take a full backup of the SecretServer database and write it to the network share \\backups\SQLServer\dbserver1. Also, backup verification and compression will be used, along with a checksum performed. If the backup is successful, full backups older than three weeks (504 hours) will be deleted from the network share.

Log Backup

EXECUTE dbo.DatabaseBackup @Databases = 'SecretServer',

@Directory = '\\backups\SQLServer\dbserver1', @BackupType = 'LOG',

@Verify = 'Y', @Compress = 'Y', @CheckSum = 'Y', @CleanupTime = 336,

@CleanupMode = 'AFTER_BACKUP'

The above command will take a log backup of the SecretServer database and write to the same network share. The same actions will be performed with backup verification, compression, and checksum. Log backups that are older than two weeks will be deleted.

Application Impact

Backups overall will not impact or impede the application from being able to perform write and read operations. You may observe spikes in resource usage on the database server under certain circumstances when the full backup is being run, but it should not be excessive.

It is advised to monitor performance during backup processing to ensure any performance issues or adjustments to schedules are addressed. As the database grows, it may require adjusting the backup scheme to meet the RPO without affecting the application's performance during business hours.

Cloud Providers

The cloud provider controls backups and the frequency of them. Specific controls around retention are specific to the cloud provider.

Azure SQL DB

Azure SQL performs automatic backups (Automated backups in Azure SQL Database).
Backup frequency is full every week, differential every 12-24 hours, and log backups every 5-10 minutes.
Log backup frequency can also be based on computing size and amount of database activity.
Backup storage is configurable, but the default configuration will replicate to another datacenter within that region.
By default, backup retention can make a point-in-time recovery seven (7) days (configurable to a maximum of 35 days).
Long-term backup retention is possible for up to 10 years.

AWS RDS - SQL Server

Support for automated and manual backups
"Backup window" is configured for the instance to include all databases, defaults to a 30- minute window
Various situations cause the backups not to occur during the window. See AWS's Introduction to backups.
Backup retention is configurable; the default depends on how the instance was created in RDS

GCP Cloud SQL

Support for automated and manual backup
The backup window is a 4-hour window, not configurable
Recommended to perform manual backups on a more frequent schedule
Reference documentation. See Google's About Cloud SQL backups.

Database Integrity

Various types of corruption can occur with databases, logical or physical. The corruption can be captured and travel with the backups of the databases as well. Corruption in the backup will cause issues when the need arises to implement a recovery plan, and there is no viable backup to use.

Identifying issues around database integrity early ensure the application's database stays online and backups are not affected.

Types of Corruption

Logical

This corruption involves consistency errors being found in the page-level structure of the database. Depending on where that page is found can determine how much effort is required to correct it or what amount of data recovery can be done.

Physical

This level of corruption is critical and will affect the uptime of the database. Physical corruption tends to be a hardware-level issue in most situations. In most cases, a backup has to be performed to recover.

Physical Corruption Warnings

There are two errors that SQL Server will write to the Windows Application event log that are excellent precursors to know physical corruption is possible.

Table: Corruption-Related Errors

Error 824	Error 825
This event occurs when data is read by the application and requires SQL Server to retrieve it from the physical file (data file). A consistency check is done when this process occurs, and if SQL Server detects an error, it will cause this event to generate. The event message will be similar to: SQL Server detected a logical consistency based I/O error You can find more details about this error at MSSQLSERVER_824 .	This error is more critical and should raise alarms (if being monitored). When the application queries data and SQL Server has to retry multiple times to retrieve it from the physical file, this event can be triggered. The event message will be similar to: A read of the file '%ls' at offset %#016I64x succeeded after failing %d time(s) with error: %ls. You can find more details about this error at MSSQLSERVER_825.

Error 824

Error 825

This event occurs when data is read by the application and requires SQL Server to retrieve it from the physical file (data file). A consistency check is done when this process occurs, and if SQL Server detects an error, it will cause this event to generate. The event message will be similar to: SQL Server detected a logical consistency based I/O error You can find more details about this error at MSSQLSERVER_824 .

This error is more critical and should raise alarms (if being monitored). When the application queries data and SQL Server has to retry multiple times to retrieve it from the physical file, this event can be triggered. The event message will be similar to: A read of the file '%ls' at offset %#016I64x succeeded after failing %d time(s) with error: %ls. You can find more details about this error at MSSQLSERVER_825.

DBCC CHECKDB

CHECKDB is the command used for running integrity checks on any SQL Server database, including the system databases. You can review the syntax for the command at DBCC CHECKDB (Transact-SQL).

Execution Performance

Running an integrity check against a database is a process that can require additional resources compared to other maintenance tasks. Some tweaks can be done based on the version and edition of SQL Server to improve the performance. In Standard Edition, the CHECKDB command is a single- threaded operation, meaning as the database grows, the process will take longer to complete. In Enterprise Edition, the same command utilizes parallel processing. In Enterprise Edition of SQL Server 2016, an option to configure the MAXDOP for the command is offered, allowing control of how many CPUs are used.

Additional details: CHECKDB From Every Angle: How long will CHECKDB take to run?

Application Impact

Internally when CHECKDB is executed against a database, it will utilize a snapshot of that database; this causes it to be an OLINE operation. This snapshot allows the operation to run without directly affecting on-going application activity against the database.

However, this task is an I/O intensive operation and should be done during minimal user activity in the application. While it won't directly affect the application, the overall high I/O activity may affect SQL Server's overall performance based on the resources the server is provided.

The CHECKDB command performs multiple operations, and those can be broken out (see documentation) and ran individually. An alternative that can allow less performant impact is to run those individual commands. A production database may only have CHECKDB run the physical checks only (WITH PHYSICAL_ONLY). It will allow the physical structure of the database to be verified for consistency. It is a low overhead check and will still check for common hardware issues that could cause data loss.

It is still recommended to perform a FULL run of CHECKDB on your production servers to ensure no type of corruption is found. Running this process on a restored database on a different server is fine, but it will not find corruption that may exist on that production server. If you restore a backup and run CHECKDB against that restored database, it merely verifies the backup you restored has no corruption in it.

Integrity Check Automation

Ola Hallengren's DatabaseIntegrityCheck Script (stored procedure)

This script is a standard tool used by many in the SQL Server community. You can download the script and find full details at SQL Server Backup. It will support the flags and options of CHECKDB by the version of SQL Server and is Availability Group aware.

The script below is commonly used for small databases, where integrity check is under an hour run time (total for all user databases).

EXECUTE \[dbo\].\[DatabaseIntegrityCheck\] @Databases = 'USER_DATABASES', @LogToTable = 'Y'

If the database is extensive in size, then a dedicated job for each database would be recommended. The script above can be updated for this purpose by replacing USER_DATABASES with the specific database's name.

Scheduling these jobs to run at least weekly is recommended.

Cloud Providers

Azure SQL DB

Azure utilizes various techniques to assist in managing data integrity. The Azure SQL engineering team provides more details at Data Integrity in Azure SQL Database. The article states running CHECKDB is unnecessary. The "what if" can still exist. Performing this maintenance is allowed on Azure SQL DB.

AWS RDS - SQL Server

The customer is responsible for running data integrity tests. The information above would be applied in the same way to the database hosted on AWS RDS SQL Server.

GCP Cloud SQL

The customer is responsible for running data integrity tests. The information above would be applied in the same way to the database hosted on GCP Cloud SQL for SQL Server.

Index and Statistics

The query optimizer lives to have good indexing and statistics in a database. An index strategy for a database will ensure the application performs optimally, at any size. Several chapters can be dedicated to indexing and statistics with SQL Server; it is a big topic for performance tuning. This section is going to cover a few areas to provide an overview.

Index Types

Index types vary on availability based on the version of SQL Server installed. The most common index types found in Delinea databases will be clustered and non-clustered indexes. If SQL Server 2016 SP1 is being utilized, then columnstore indexes may be found in the database.

The index and type found in a database can be gathered using the following query:

SELECT o.name, i.name AS indexName, i.type_desc, o.type_desc, ps.row_count FROM sys.indexes i

INNER JOIN sys.objects o ON i.object_id = o.object_id

INNER JOIN sys.dm_db_partition_stats AS ps ON i.object_id = ps.object_id AND i.index_id = ps.index_id

WHERE o.type_desc = 'USER_TABLE' ORDER BY o.name

More details on index types can be found at AI Skills Challenge: Indexes. Additional details on index architecture can be found in the SQL Server and Azure SQL index architecture and design guide.

Statistical Objects

Statistics are the lifeblood for the Query Optimizer in SQL Server. These objects are associated with columns and indexes. When an index is added to a table, a statistics object will be created. If there are any columns on the table used in aggregated reference or not included in an index, a column statistic object will be created. More details on statistics can be found at AI Skills Challenge: Statistics.

Fragmentation

Indexes will be the object type that deals with fragmentation. As you may use the alphabet or numbers to order a list, fragmentation in a table is when the physical order of that list no longer matches the logical order. Additional details on fragmentation can be found at Optimize index maintenance to improve query performance and reduce resource consumption.

Maintenance

The maintenance for indexes and statistics is intentionally split because it is recommended to perform them separately. From a pure performance perspective, when the database is relatively small (~under 50GB), both object types are not likely to take long; but it is purely based on SQL Server's resources. This type of maintenance is I/O intensive and can cascade to be memory and CPU intensive as the data footprint grows.

Important: As the database grows, updating, adjusting, and monitoring this process are vital to getting the best performance out of SQL Server.

Indexes

Index maintenance covers fragmentation. Two types of maintenance tasks can be done with indexes: rebuild and reorganization. Each one has pros and cons:

Rebuild

Pros:

Heavy resource usage for I/O on more extensive tables/indexes
Potential to reclaim space in the data file and physical storage
Updates index statistics object in the same transaction/processing

Cons:

Resource intensive (I/O)
The transaction cannot be killed without rollback occurring (that is, if the transaction runs for 2 hours, killing requires 2+ hours to rollback)

Reorg

Pros:

Low resource usage
The transaction can be killed without rollback
It can pick up where it left off (to a degree), allowing for short burst during small maintenance windows
Table locking is only on the current data page

Cons:

Will not update statistics
Based on the maintenance window, sizable indexes may take time to clear fragmentation

Statistics

The statistics objects exist for the columns and indexes on a table. These play a massive role with SQL Server's Query Optimizer. The statistics are how SQL Server knows how much data exist in a table. As the table grows, those statistics can become stale or out-of-date. SQL Server will update those statistics as they get marked out-of-date, but the method and schedule it does is not always optimal.

An example of how statistics work in SQL Server about data in a database can be found at Explaining SQL Server Statistics with Playing Cards.

Application Impact

When maintenance is performed on indexes and statistics, it is an I/O intensive process. For fragmentation to be cleaned up, the data has to be both read and written. If statistics have to be updated, the data has to be read; based on the sample rate, this can cause a higher workload based on data size.

When this maintenance is scheduled and implemented, it is recommended to monitor the process and periodically review the timing and workload it is causing on the SQL Server environment. There can be times when large tables need to be taken out of the regular schedule and have a dedicated schedule during a longer maintenance window or downtime for the application.

Index and Statistics Automation

Ola Hallengren's IndexOptimize (stored procedure)

This script is a standard tool used by many in the SQL Server community. You can download the script and find full details around this tool here. This procedure combines the ability to handle both indexes and statistics within one script, or you can create separate jobs when scheduling requires.

The scripts provided below are simply samples and can be used for small-size databases. As the environment and data grow, the use of these scripts and configurations should be evaluated and tested.

The following T-SQL script can be used as a base template to manage both index reorg and statistics update in a single job (small environments):

EXECUTE \[dbo\].\[IndexOptimize\] @Databases = 'SecretServer', @FragmentationLow = 'INDEX_REORGANIZE',

@FragmentationMedium = 'INDEX_REORGANIZE', @FragmentationHigh = 'INDEX_REORGANIZE', @UpdateStatistics = 'ALL', @OnlyModifiedStatistics = 'Y',

@StatisticsSample = 80, @MSShippedObjects = 'N',

@TimeLimit = 28800, -- Time limit for work is 8 hours @LogToTable = 'Y'

Considerations on this command:

This command's sampling rate is set to 80%; it is essential to monitor if this sampling rate is sufficient for the application's processing and workload.
The TimeLimit set to 8 hours means when that limit is reached, no further commands on index reorg or statistic update will be issued; any current command executing will still have to complete.
An additional parameter not specified, accepting the default value, is StatisticsModificationLevel. Please see the documentation link above for more details on this, if it needs to be used.

Cloud Providers

Azure SQL DB

Index and Statics maintenance is the responsibility of the customer.

AWS RDS - SQL Server

Index and Statics maintenance is the responsibility of the customer.

GCP Cloud SQL

Index and Statics maintenance is the responsibility of the customer.

Pruning Secret Server Log Data

This section does not apply to cloud subscriptions.

Secret Server has processes that run as background processes, such as Discovery or Remote Password Changing, that frequently log the occurring activity. The majority of Secret Server features have a setting to control the data retention called Days to Keep Operational Logs. The default for this configuration is 30 days.

As well, the data retention feature was added to configure the Database Size Management. The management feature gives you a central location to control the data's max record age for particular Audit and Log/History tables. A number of the tables are included in this section as well.

The data is being stored in log tables within the database. These tables can commonly be the cause for the size of the database growing (logical space). Suppose your environment utilizes a large number of features in Secret Server heavily. In that case, you can benefit from having an external process such as a SQL Server Agent Job running to prune this data more frequently.

The following sections provide the query that can be used as a starting point for pruning each table as needed. It is recommended to adjust these based on each feature's usage and test before deploying to production instances.

Table pruning varies whether it is based on a DATETIME or record count. Adjust the @stopDate or @batchSize as environmental needs require.

Initial Purge Processing

Before implementing the following example queries on a scheduled basis, you may find that an initial purge of log data is required in the tables mentioned below. To trim the tables to their initial desired size, large amounts of data may require deletion. Doing this in SQL Server can be done utilizing a batch method to query a certain amount, delete and then get the next amount until it does not find any more to delete.

To delete in batches is done using TOP (<number>) for the DELETE statement. The below can be used as a template to wrap around the DELETE statement of the above queries:

DECLARE @BatchSize INT = 4999 WHILE 1 = 1

BEGIN

DELETE TOP (@BatchSize)

/\* place log script here, after the DELETE \*/

-- Availability Group configuration in use (uncomment next line)

--WAITFOR TIME @waitTime

IF @@ROWCOUNT \< @BatchSize BREAK END

An example for the tbSystemLog the query would look like this:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @MaxLogLength INT = 500 -- record count to keep DECLARE @BatchSize INT = 4999

WHILE 1 = 1 BEGIN

DELETE TOP (@BatchSize) FROM tbSystemLog

WHERE tbSystemLog.SystemLogId NOT IN ( SELECT TOP(@MaxLogLength) SystemLogId FROM tbSystemLog

ORDER BY tbSystemLog.LogDate DESC

,tbSystemLog.SystemLogId DESC

)

IF @@ROWCOUNT \< @BatchSize BREAK END

Example Pruning Code

tbStatusMessage Log Table

The bulk of logging for the activity occurring in Secret Server is being logged within this table.

Thread name examples: SecretItemHashUpdater, ActiveDirectoryMonitor, LegacyAgentMonitor, NodeClusteringMonitor, ConnectWiseMonitor, SSHProxyServer, ExpiredSecretMonitor, RPCHeartBeatMonitor, DiscoveryMonitor, ComputerScanMonitor, SecretComputerMatcherMonitor

Pruning code example:

USE \[SecretServer\]

SET NOCOUNT ON;

DECLARE @daysToKeep INT = 30 --days to keep of log

DECLARE @threadName varchar(250) = '' -- provide thread name to clean DECLARE @stopDate DATETIME = DATEADD(DAY, -@daysToKeep, GETDATE()) DELETE

FROM tbStatusMessage WHERE

ThreadName = @threadName AND CreatedOn \< @stopDate

One consideration is utilizing the above as a template and control each ThreadName as your environment in different Agent jobs, that is, DiscoveryMonitor may contain more logs than RPCHeartBeatMonitor due to feature usage.

Do not remove the filter for ThreadName and do it purely on CreatedOn. There will be a varying amount of logs in Secret Server for each log that may affect the ability to troubleshoot.

tbComputerScanLastStatus Log Table

Pruning code example:

USE \[SecretServer\]; SET NOCOUNT ON;

DECLARE @daysToKeep INT = 30 --days to keep of log

DECLARE @stopDate DATETIME = DATEADD(DAY, -@daysToKeep, GETDATE()) DELETE

FROM tbComputerScanLastStatus WHERE LastScanDate \< @stopDate

AND NOT EXISTS (

SELECT NULL AS \[empty\]

FROM tbComputer AS c

WHERE c.ComputerId = tbComputerScanLastStatus.ComputerId

)

tbComputerScanLog Log Table

Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @MaxLogLength INT = 1000 -- record count to keep

/\* IF EXIST - SQL Server 2016+ only \*/ DROP TABLE IF EXISTS \#TempComputerScanLog

;WITH CslRowNumbersByComputer AS (

SELECT ComputerScanLogId,ROW_NUMBER() OVER (PARTITION BY ComputerId ORDER BY ScanDate DESC

) AS RowNumber

FROM tbComputerScanLog

)

SELECT ComputerScanLogId INTO \#TempComputerScanLog FROM CslRowNumbersByComputer

WHERE RowNumber \<= @MaxLogLength

DELETE

FROM tbComputerScanLog

WHERE ComputerScanLogId NOT IN (SELECT 1 FROM \#TempComputerScanLog) GO

tbEventPipelineActivity Log Table

Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @daysToKeep INT = 15 --days to keep of log

DELETE epa

FROM tbEventPipelineActivity AS epa

JOIN tbEventPipelinePolicyRun pr ON epa.EventPipelinePolicyRunId = pr.EventPipelinePolicyRunId WHERE pr.QueuedDateTime \< DATEADD(DAY,-@daysToKeep,GETDATE())

tvEventPipelinePolicyRun Log Table

Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @daysToKeep INT = 15 --days to keep of log DELETE

FROM tbEventPipelinePolicyRun

WHERE QueuedDateTime \< DATEADD(DAY, -@daysToKeep, GETDATE())

tbSecretViewTracker Log Table

Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @daysToKeep INT = 1 --days to keep of log

DELETE

FROM tbSecretViewTracker

WHERE RecordedOn \< DATEADD(DAY,-@daysToKeep,GETDATE()) GO

tbOAuthExpiration Log Table

Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @daysToKeep INT = 1 --days to keep of log

DELETE

FROM tbOAuthExpiration

WHERE ExpirationDate \<= DATEADD(DAY,-@daysToKeep,GETDATE())

tbSystemLog Log Table

This is the main log table for Secret Server. Normally, you would use this data for troubleshooting, but, if needed, you can prune it too. Pruning code example:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @MaxLogLength INT = 500 -- record count to keep

DELETE

FROM tbSystemLog

WHERE tbSystemLog.SystemLogId NOT IN ( SELECT TOP(@MaxLogLength) SystemLogId FROM tbSystemLog

ORDER BY tbSystemLog.LogDate DESC

,tbSystemLog.SystemLogId DESC

)

tbSecretLog Log Table

Pruning code example:

USE \[SecretServer\]; SET NOCOUNT ON;

DECLARE @MaxDelete INT = 500

DECLARE @MaxLogLength INT = 1000 --record count on non-success

DROP TABLE IF EXISTS \#TempSecretNonSuccessLog

;WITH SecretLogRowNumbers AS (

SELECT TOP(@MaxLogLength) SecretLogId FROM tbSecretLog sl WITH (NOLOCK)

JOIN tbSecret s WITH (NOLOCK) ON s.SecretId = sl.SecretId AND s.Active = 1

WHERE sl.Status \<\> 'Success') SELECT SecretLogId

INTO \#TempSecretNonSuccessLog FROM SecretLogRowNumbers

DROP TABLE IF EXISTS \#TempSecretSuccessLog

;WITH SecretLogSuccessRows AS (

SELECT SecretLogId,

ROW_NUMBER() OVER (PARTITION BY sl.SecretId ORDER BY sl.DateRecorded DESC) AS rowNum FROM tbSecretLog sl WITH (NOLOCK)

JOIN tbSecret s WITH (NOLOCK) ON s.SecretId = sl.SecretId AND s.Active = 1 WHERE Status = 'Success')

SELECT SecretLogId, rowNum INTO \#TempSecretSuccessLog FROM SecretLogSuccessRows WHERE rowNum = 1

DELETE TOP(@MaxDelete) FROM tbSecretLog

WHERE SecretLogId NOT IN (

SELECT SecretLogId FROM \#TempSecretNonSuccessLog UNION

SELECT SecretLogId FROM \#TempSecretSuccessLog

)

Initial Purge Processing

Before implementing the above queries on a scheduled basis, you may find that an initial purge of log data is required in the tables mentioned above. To trim the tables to their initial desired size, large amounts of data may require deletion. Doing this in SQL Server can be done utilizing a batch method to query a certain amount, delete and then get the next amount until it does not find any more to delete.

To delete in batches is done using TOP (<number>) for the DELETE statement. The below can be used as a template to wrap around the DELETE statement of the above queries.

DECLARE @BatchSize INT = 4999 WHILE 1 = 1

BEGIN

DELETE TOP (@BatchSize)

/\* place log script here, after the DELETE \*/

-- Availability Group configuration in use (uncomment next line)

--WAITFOR TIME @waitTime

IF @@ROWCOUNT \< @BatchSize BREAK END

An example for the tbSystemLog the query would look like this:

USE \[SecretServer\] SET NOCOUNT ON;

DECLARE @MaxLogLength INT = 500 -- record count to keep DECLARE @BatchSize INT = 4999

WHILE 1 = 1 BEGIN

DELETE TOP (@BatchSize) FROM tbSystemLog

WHERE tbSystemLog.SystemLogId NOT IN ( SELECT TOP(@MaxLogLength) SystemLogId FROM tbSystemLog

ORDER BY tbSystemLog.LogDate DESC

,tbSystemLog.SystemLogId DESC

)

IF @@ROWCOUNT \< @BatchSize BREAK END

Availability Group Considerations

This section discusses specific considerations when performing database maintenance with Availability Group (AG) databases in SQL Server.

Backups

AGs allow backups to be offloaded to secondary replicas based on how the Backup Preference is configured. See Availability Group Properties: New Availability Group (Backup Preferences Page) for details. Full and Differential backups will be the only ones that can be performed on the secondaries (only ones that support copy-only). The Log backups can only be taken from the primary replica. As the database grows, this can be a consideration to offload that activity.

When utilizing the DatabaseBackup script and frequent log backups are being done (e.g., every 5 minutes), the cleanup of those log backups may need to be done in a different job. When a larger number of backup files need to be cleaned up, based on the CleanupTime configured, you may observe excessive waiting periods. If this begins to affect log backups not being done every 5 minutes, it is recommended to move that cleanup to a different job.

Transaction Log

The log file in a database joined to an AG is managed by SQL Server differently than a standard database. In an AG configuration where synchronous replicas are in use, the log transaction is not marked inactive until that transaction has been written on all log files for the database replicas.

Large transactions can affect this timing/latency. Maintenance such as Index rebuilds can cause large amounts of small transactions to occur. In that processing on an AG, the transaction won't complete until it is replayed on the log for all secondary replicas, in synchronous mode.

A common practice is to adjust those secondary replicas' the availability mode to asynchronous commit during maintenance windows. See Change availability mode of a replica within an Always On availability group for details. It can ensure the activity can run more efficiently on the primary replica and complete within limited maintenance windows. Once the redo log catches up on the secondaries, they can be set back to synchronous commit mode.

Shrinking Log Files

Scheduling or frequently shrinking the database files for a SQL Server database is not recommended. In situations where the above log management scripts were used for the initial pruning of a large amount of data, you can shrink the data file to reclaim storage space. There may be other occurrences that a shrink is warranted. If proper maintenance is in place for the regular database, shrinking database files can cause more performance issues as time goes on.

When you need to shrink the data files, be aware that it requires exclusive locks on the database tables and indexes. More care is needed when shrinking the data file to ensure no connections are present to the database to complete the operation without interruption. Doing this while monitoring the progress is recommended, especially if this has to be done during business hours or when the application cannot be brought down. The data file shrink is also an IO and CPU-intensive operation.

When you need to shrink the log file, it is less intrusive and will not cause locking, and generally is not a CPU-intensive operation. The log file shrink will only be able to reclaim the inactive portion of the log file. Steps such as taking a full backup and then a log backup, at times, can help get those active portions switched to inactive so you can shrink the log file.

Shrinking Availability Group Log Files

Database log file growth can happen on the primary replica and even secondaries. When it becomes excessive, it can be a cause for concern. If the log file on the primary is growing and taking action to shrink them are not working, check your secondaries' performance.

Secondary replicas in asynchronous-commit mode still need that transaction from the primary replayed to keep the data synced. When those secondary replicas get behind, it prevents the log from clearing on the primary, and continuous log growth will be observed. Troubleshooting will be required on those secondaries to determine why they are not keeping.

One area to check if you see continuous log growth on the primary is ensuring all the secondaries have data movement enabled. If SQL Server paused data movement, you could see log file growth on the primary. Once you turn data movement back on, it will only be a waiting game until those secondaries are caught up. Once that happens, perform a full backup on the primary, and then the log file can be shrunk.

You can view the logical structure of the log file using DBCC LOGINFO. This command outputs a record for each virtual log file that SQL Server has created. The status column of that output will show you 2 (active) or 0 (inactive). The active records could mean several things: backup has not happened yet, active transaction, or waiting on replication.