Fun with the overnight crew...

Dateline: Sunday morning, 2:30am (my week on call)...

Kevin3NF: {snort} thisiskevin
Caller: Hey man...sorry to wake you but we have some t-log backups failing for Acme Widgets
Kevin3NF: whatstheerrormessagein the job (waking up a bit, not out of bed)
Caller: I can't find the job in the SQL Agent job list
Kevin3NF: WhatIsTheExactAlertFromNetIQ? (slightly clearer now)
Caller: SQL Task failed: 'Backup Job 17' failed
Kevin3NF: The job list is alphabetical...did you sort by name?
Caller: Yeah...but theres like 2000 jobs...most have numbers and letters and stuff
Kevin3NF: Those are reporting services jobs...just ignore them and look for the Backup job
Caller: I'm looking for Job 17, but its not in the 'J's
Kevin3NF: Did you look under 'B', since the job's name starts with 'Backup...' ?
Caller: silence
Caller: I'm an idiot
Kevin3NF: Goodnight.
Caller: Click
Kevin3NF: snore.

Still makes me laugh...I went back to sleep knowing I would post this little 2 minute interaction :)



Why we maintain databases...

I was blissfully unaware of this issue until I came in this morning and saw the debris...

The players: Freakshow and Tiny (my co-workers), the Customer, and the application Vendor.

Customer: Everything is slow and my cluster keeps failing over! Must be SQL Server!
Freakshow: We didn't change anything on your servers sir...
Customer: You must have!
Freakshow: Could be bad indexes or stats...
Customer's Vendor: Oracle doesn't have this issue
Freakshow: Grrrr....
Tiny: Double Grrrrrr.....
Freakshow: You have 49 million rows, and are table scanning/locking...
Customer: We have 3 years of data
Vendor: You should have 90 days, and you are 3 years behind on versions of the application...
Freakshow: Let me update the stats....
Customer: Hey...its working!!!
Vendor: You know that 'purge' utility we provide?....it comes with fries :)
Freakshow: I'm going back to bed
Tiny: I'm going home (6 hours past end of the shift...)

Moral: Purge your data, keep your versions up, and maintain your indexes folks...


17803 errors all over the place....

We have close to 400 SQL Server instances to manage. Within the last month we are starting to see 17803 errors at least once a week on completely different servers, different customers, different hardware, etc. We used to get one maybe every 6 months. SQL 2000, SP4 + .2187 hotfix rollup on most.

Anyone else seeing this? I'm wondering if something at the O/S level is the commonality (assuming there is one).


Do what?

So....I've got this job that backs up the T-log every 15 minutes (SQL 2005). Works like a champ across all servers.

Out of the blue, it starts failing with a "No full backup detected", despite the fact that one occurred less than an hour ago. Its recorded in the msdb system tables and the file is right where the full backup job left it.

The customer had decided for reasons unknown to create a job that does this (every hour):

Use MyDB


I kid you not. And no...there is not a second file to EMPTY the log contents into.

For those of you new to the game, when you Truncate a T-log, the sequence is broken and SQL Server barfs on future log backups until you run a full.

Kudos to my co-worker (code name Freakshow) for catching this as I was headed to the ERRORLOG to see what was going on. He nailed in 10 seconds what would have taken me 7 minutes. And then he knew it was a new job while I was still reading the log. Scary smart, that dude.

No fries, but ketchup squirted all over that client.



Don't blink!

Customer: I want to free up space on my F drive
Kevin3NF: Shrink your 40gb log file to 5gb
Customer: Will that affect production?
Kevin3NF: No, it didn't.
Customer: Huh?
Kevin3NF: It's done. You want fries with that?

Sometimes they just pitch you a softball ;)


Performance on a clustered SQL 2005 instance

Customer sez: I've been getting timeouts all morning
Kevin3NF sez: CPUs are good, memory is fine (buffer cache hit ratio and page life expectancy...my two favorite memory counters for quick hit analysis..)
Kevin3NF: Uh-oh....why is the disk queue so high for drive N? Houston, we found the problem...need clearance for root cause.

Root cause: Someone decided to expand the volume on the SAN that included the data drive, T-log drive and backup drive while the instance was running, and at HIGH priority. The second it finished, everything went back to normal.

Sadly, I needed a Windows/Storage guy to tell me what was going on after I identified I/O as the resource bottleneck.



Cycling related:

Congrats to the trio of ladies from Austin, TX that were in the top 10 at the Women's National Crits this weekend!

2005 Log Shipping

ok...I'm behind the times and just started working with SQL Server Log Shipping instead of writing my own custom scripts (Backup, Copy, Restore...just ain't that hard...)

Messing with a dead LS database this weekend, I tried to script out the existing configuration from the Log Shipping GUI. Has anyone else seen this thing drop all of the information on the destination server when it created the script?

I didn't bother to Google it and just re-created from scratch.

It bothers me that you have to drop and re-create to "re-initialize" a log shipped database.

No fries for me :(


LazyWriter: warning, no free buffers found.

SQL 2005, 9.0.3042

Got this in my ERRORLOG, along with a ton a memory messages not worth repeating here.

8GB RAM in the box, SQL Server set to dynamically allocate, and using 6.3 gb.

SQL was not allowing connections, giving the misleading "remote connections are not allowed/TCP" messages (translated: "something bad happened")

Solution (as seen in other places, but not many)...hard reboot of the server, bring up SQL, configure max memory to 6GB (anything other than dynamic, but leave some for the O/S).

No fries...took 30 minutes to get to the resolution. Dessert maybe :)




People...if you are going to have a 200gb database....please have a backup drive bigger than 250GB in case you want to also back up your system databases, T-logs, etc.


Rant over.

You don't get any fries with that one...



Team Discovery Channel wins the Tour of Georgia!

Sorry folks...I just haven't done much of interest in the SQL Server world lately....pretty much just the same old same old :)



Go Levi!

Levi Leipheimer wins the Amgen Tour of California!


Enterprise Manager hang

My customer's SQL 2000 .818 box has started timing out when he tries to use Enterprise manager to backup or restore a database.

After giving him the command to run in Query Analyzer, I took a look at his backup tables in MSDB and found roughly 3.8 million records in each...dating back to mid 2005.

I'll be running sp_delete_backuphistory tonight. :)

If you don't want to use the cursor driven proc provided by MSFT, click here for Tara Kizer's custom script:

Sorry for the long delay between postings...its been slow :)



Slow day at the office...

IT Alphabet...



Comment of the day from the newsgroups:

Until you test a backup by restoring, you don't have recovery plan, you have a recovery hope.

-- Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
Database in Read-Only after detach/attach

I got a "middle of the night panic call" this morning. My customer was moving 2 user databases to a new drive on the same server.

After detach and copy, he re-attached successfully. But, the databases are in read-only mode and cannot be changed (Error 5105, Device activation error. The physical file name '%.*ls' may be incorrect.)

SQL 2000, post Sp3, Windows 2003, sp1.

Additional errors:
2007-01-16 03:06:41.02 spid61 Starting up database 'MyDatabase'.
2007-01-16 03:06:41.02 spid61 udopen: Operating system error 5(Access is denied.) during the creation/opening of physical device F:\SQLData\MyDatabase_Data.mdf.
2007-01-16 03:06:41.02 spid61 FCB::Open failed: Could not open device F:\SQLData\MyDatbase_Data.mdf for virtual device number (VDN) 1.
2007-01-16 03:06:41.02 spid61 udopen: Operating system error 5(Access is denied.) during the creation/opening of physical device F:\SQLData\MyDatabase_Log.ldf.
2007-01-16 03:06:41.02 spid61 FCB::Open failed: Could not open device F:\SQLData\MyDatabase_Log.ldf for virtual device number (VDN) 2.

Turned out that the SQL Server startup account did not have the correct security permissions on the folder or the files. Whoops. :)



New Year's resolutions kill Log Shipping!

ok...funniest issue I've run across in a while :)

We have some custom log shipping script that was working just fine until the end of 2006. Starting around January 2, the restore process couldn't restore fast enough to keep the standby server current.

I looked at the T-Log files for the last 2 weeks and saw that the T-log backups between 3 and 5 am were 2 to 3 times the size they used to be.

Why? Because the customer here is a national Fitness center business that had a huge influx of memberships and activities :)

People trying to lose weight broke log shipping.

The fix? Run the restore process more often (it was only running off hours).

You get fries AND a low-fat shake with this one...

Autogrow continued:

re: 12/19/2006 SQL 2005 Autogrow issue

I have confirmed with Microsoft that the bug will be fixed in SP2, and that there is no workaround other than monitoring.

I have written a script for my customer that will check the 'growth' and 'status' values in the sysfiles table of each database on their system. If the growth number is over x and the status is over 'y'...we have a problem.

The best way to avoid this whole mess:


If you rely on Autogrow, you are not properly administering your database. Autogrow is a failsafe mechanism for when you get unexpected growth, and nothing more.