Posts tagged Recovery
This months T-SQL Tuesday is about automation and I thought I’d write about extending existing automation. One of my favorite scripts for automation is Ola Hallengren’s Backup & Maintenance solution. Ola’s scripts are a fantastic way to automate highly configurable backups and maintenance on your SQL Server instances. If you’re not using them, you should seriously consider looking into why.
This solution serves as an outstanding base but like anything else its can be useful to tweak things a bit. Extending the initial automation provided by his scripts is what this post is all about.
In particular, I’ve modified Ola’s scripts to generate the files needed to restore all of the databases that have been backed up with his solution. In particular, having the ability to easily restore the whole server in the case of a disaster. Though, you could easily pull out one DB to only restore it. This script is currently only written for litespeed since that’s what I use for backups. However, it could easily be changed to support native backups or any of the other backup products that Ola’s scripts can be configured for. Perhaps Ill work on those in the future if it would be useful.
The idea is that every time you take a backup the backup job will create a .sql file on the server filesystem in the backup directory that can be used to restore to the point of the backups that were just taken.
This solution includes three pieces, an additional stored procedure, an additional step in both of the backup jobs to execute that stored procedure and lastly a step in the cleanup procedure to remove the restore scripts from the filesystem that have aged.
A couple of notes of caution:
As with anything you find on the internet, please use at your own risk in a development/test system and proceed with caution.
This script makes several assumptions including
- That you’ve installed Ola’s commands into the master database
- That you’re using litespeed
- That logging to the commandlog table is enabled
The stored procedure is relatively simple and accepts a single parameter @type “LOG” will generate the script as of the last log backup taken or for any other parameter, I happen to use “FULL”, it generates the script based on the last full backup.
CREATE PROCEDURE [dbo].[GenerateRestoreScript] (@type NCHAR(30) = 'LOG') AS DECLARE @ID INT DECLARE @DB NVARCHAR(128) SET NOCOUNT ON SELECT @ID = MAX(database_id) FROM sys.databases IF @type = 'LOG' SET @type = 'xp_backup_log' ELSE SET @type = '' --These intentionally not commented in the script as a precaution (to generate an error) SELECT 'ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-ALERT-' SELECT 'PLEASE BACKUP THE TAIL OF THE LOG SELECT 'OTHERWISE THIS COULD BECOME A RGE (GOOGLE THE ACRONYM!)' SELECT 'IF YOU ARE OK REPLACING THE DB AND LOOSING DATA IN THE TAIL LOG RUN THIS SCRIPT WITHOUT THESE COMMENTS ' SELECT 'RAISERROR(N''ARE YOU SURE YOU WANT TO DO THIS?'', 25,1) WITH Log;' SELECT '--------' WHILE @ID > 2 BEGIN SELECT @DB = NAME FROM sys.databases WHERE database_id = @ID SELECT @ID = @ID - 1 SELECT '----' + @DB + '-----------------------------------------------------' SELECT 'EXECUTE ' + REPLACE(Command, '_backup_', '_restore_') + ', @filenumber = 1, @with = N''' + CASE WHEN rn <> 1 THEN 'NO' ELSE '' END + 'RECOVERY''' + CASE WHEN CommandType = 'xp_backup_database' THEN ', @with = N''REPLACE'';' ELSE ';' END FROM ( SELECT SUBSTRING(LEFT (Command, CHARINDEX(''', @with =',Command)),CHARINDEX('[master]',Command),LEN(Command)) AS Command , ROW_NUMBER() OVER (ORDER BY cl.ID DESC) AS rn , CommandType FROM [master].[dbo].[CommandLog] cl WHERE cl.DatabaseName = @DB AND (cl.CommandType = 'xp_backup_database' OR cl.CommandType = @type) AND cl.ID >= ( SELECT MAX(ID) FROM CommandLog c WHERE CommandType IN ( 'xp_backup_database' ) AND cl.DatabaseName = c.DatabaseName ) ) AS rntab ORDER BY rn DESC END
To execute the stored procedure, this needs to be added as an additional cmdexec job step to the Full backup job (make sure to change the directory where you want the .sql files stored (H:\SERVERNAME below))
sqlcmd -l 30 -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -y 0 -b -Q "EXEC [dbo].[GenerateRestoreScript] ''FULL''" –o”H:\SERVERNAME\DRFULL_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM))_RESTORE.sql" –w50000
To execute the stored procedure, this needs to be added as an additional cmdexec job step to the Transaction log backup job (make sure to change the directory where you want the .sql files stored (H:\SERVERNAME below))
sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -y 0 -b -Q "EXEC [dbo].[GenerateRestoreScript]" -o"H:\SERVERNAME\DRLOG_$(ESCAPE_SQUOTE(STRTDT))_$(ESCAPE_SQUOTE(STRTTM))_RESTORE.sql" –w50000
This cmdexec Job step that needs to be added to the output file cleanup job to clean up old .sql files (make sure to change the directory where the .sql files stored (H:\SERVERNAME below))
Note: currently this configuration keeps the files from the past 3 days but the actual files kept depends on when the cleanup job is scheduled.
cmd /q /c "For /F "tokens=1 delims=" %v In (''ForFiles /P "H:\SERVERNAME" /m *RESTORE.sql /d -3 2^>^&1'') do if EXIST "H:\SERVERNAME"\%v echo del "H:\SERVERNAME"\%v& del "H:\SERVERNAME"\%v"
I have these steps scripted into Ola’s original solution .sql so the folder names are set properly and job creation is completely automated. Ill leave that part of extending automation to you, dear reader, as homework.
This corruption story begins like many. Somebody in a server room far far away decided to make a change to a VMware guest machine and that little change rippled through our poor server like a lady Gaga Meat Dress through the VMA’s. Needless to say, it wasnt pretty. The full set of events may never be known by me but it appeared as though our guest server ran out of disk space on the OS and some form of recovery was done.
What we started with was a sql 2005 sp3 server where 1 of the drives was apparently corrupted, So 2 SQL instances wouldnt start. They were both erroring with the message :
Error: 9003, Severity: 20, State: 1.
The log scan number (23:5736:37) passed to log scan in database ‘master’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Using trace flag 3608 and startup parameters -c -m I set about to do a normal “disaster” recovery of our server
After rebuilding the master database, everything came online successfully. Then master was recovered from the previous backup. Once master was online I started getting the very same error message about the model database
Error: 9003, Severity: 20, State: 1.
The LSN (11:999:1) passed to log scan in database ‘model’ is invalid
This would prove to be a trying error! it took about several iterations and quite a time to figure out exactly what was going on.
On this server after initial setup we had moved the system databases from the install drive to seperate drives for log and data. When rebuilding master, the system db’s wind up back in the default directories but, after recovering master, the databases are pointed back to the original locations.
Once we got the server started the log scan error message for model showed up so, I began what I thought would be a normal restore of the model database. Unfortunately, there was no way for model to be restored. During the restore command, I got alternating messages that the model database log file was corrupted
Error: 3283, Severity: 16, State: 1.
The file “modellog” failed to initialize correctly. Examine the error logs for more detail
The Error 3283 Would be followed by
the database ‘model’ is marked RESTORING and is in a state that does not allow recovery to be run.
After trying various iterations of deleting the existing model log & database files, copying in the newly created ones and running restores, nothing was working. I began to think the disks were actually having problems, or the backup was bad. After verifying both the backup and the disk config I was left with only a hail mary –> sp_detach_db
After detaching model, I copied in the newly created model files (from the rebuild of master) and ran sp_attach_db on them. Once the Model database was attached the instance started successfully!
After the instance started model was restored from the same backup and the instance restarted. Finally, once the instance came online, it was a standard restore of all the user databases.
Im not sure what about the logscan error in model caused the errors I saw, but, both instances behaved exactly the same. I had to detach and reattach a blank model to make the other instance work as well.
After going through this, I went back and tried to reproduce the problems by intentionally corrupting model and its transaction log in various ways. Every corruption I could cause in model behaved as I expected and a simple restore statement worked. Im still not sure WHY this happened but, hopefully it wont happen again and if it does there wont be so much testing to figure out how to get model online