Top Ten SQL Server Flameouts

 

This blog post is a precursor to a presentation I’ll be giving at SQL Saturday in NYC on May 30th, 2015.  For those of you who will be attending the conference and this session, be warned that this blog post contains spoilers!

I began working for kCura in 2009 as an Application Support Specialist.  I was hired in a senior role, and over the next few months I worked with 6 other application specialists, back when training was baptism in fire and our clients’ SQL databases were just starting to grow at a very rapid rate.

Our key challenges then were using the right tools, understanding how to read the tools, and working together with our clients.  There were some very long conversations, and some very, very deep dives into performance tuning across all aspects of the platform.  When problems were experienced, if SQL experienced an “event” of some sort, such as a shut down, a stall, blocking, whatever – simply “getting through it” was never enough.  We wanted to know what caused it.  We had a burning desire to find the root cause.

Over the years, a few recurring themes surfaced.  Of course, we had a couple of oddballs – things you may never see or may even believe can’t possibly have happened.  You are entitled to your own opinion, I am not here to argue the points of fact or the historical record.  I am here to tell a story, a story of 10 SQL server flameouts.

Firstly, you may ask, “What’s a flameout?”  Analogous to when a SQL Server “crashes” or loses it’s ability to perform it’s primary function – which is to run and complete queries against a database – a flameout is a non-technical term for the loss of “flame” in a jet-engine.  It can be caused by any number of things – a failed fuel pump, a fire, bird-strike, etc.   This is not a blog post about top ten jet engine flameouts, though.  It’s about SQL.  The bottom line is that if a fighter jet, with just one engine, has a flameout – it’s lost all propulsion. The same could be said for a SQL server.

Fatal Error 211 (flameout 1 of 10)

This is about corruption, and the sudden discovery of it.  This blog post was a direct result of an assertion that was made that the corruption was caused by a log drive becoming full.

This problem did not happen in the logs.  Data on disks becomes corrupt just sitting there.  Data on drives can become corrupt, even when they are not under power.

Log files becoming corrupt isn’t an issue, either.  There is this trick where you can rebuild a missing log, and that is the topic of a later flameout.

There were other questions about what caused this:  Could a corrupt document or load file cause this? In the application that loads data to SQL. data is stringently typed, and there is no character or combination thereof that could have caused this.  Documents also don’t get loaded into Relativity, just the metadata from them, which goes into SQL via typed variables.  As for why the corruption didn’t cause trouble until recently, there was a restart.

The most probable theory of why corruption appears after a restart is that the (clean) system table was cached in RAM and was different than the (corrupted or missing) one on disk. When the server restarted, the instant SQL tried to look at that bad table, it choked.  SQL always looks to RAM for the most recent data, and if a table has cached, it will never go back to disk for it unless it is changed.  Data on disk NEVER changes before it changes in RAM, so SQL would never have any reason to go back and check the table on disk.  In other words, SQL looks for changes in RAM and writes them to disk – not vice versa.  It knows from the SQL log where the most recent data is and whether or not it has been written to disk.  This relationship is at the foundation of something called ACID – atomicity, consistency, independence, and durability of transactions.

This also explains why the backup was bad.  Backups always get the most recent data, and it gets it from disk, and since it had no reason to believe the system allocation pages had changed….

The corruption may also have been noticed (suddenly) if something changed the system allocation pages in memory during the week since the last checkDB.   As soon as it went to commit the change, to that corrupt spot on disk that the DBCC was checking, then the roof caved in.

At this time, there are no known unpatched defects in SQL Server 2008 that cause corruption.  Certainly, a defect in SQL could be at the root cause, but the more likely instigator is the physics of electromagnetism and entropy.

RAM – maxed out (2 of 10)

We’ve all done it – we’ve all been guilty of thinking that some setting or another on our SQL Server is a “set-it-and-forget-it” (SIFI)  setting.  In fact, we have a 60+ page guide dedicated to optimizing SQL for running Relativity.  This guide serves as a good starting point.  Key words are “Starting” and “point.”  Most settings in SQL will need adjustment based on differing workloads – even in the same application. If you don’t know why you are locking pages in memory, and you don’t know why you are raising your minimum memory value, then you shouldn’t do it.  You should get help. Professional help.

A couple of key things to know:

  1. There are three RAM counters that are presented on the performance tab in task manager (Windows Server 2008):
    1. Cached RAM = RAM that was in use but isn’t in use now and is available for use.
    2. Available RAM is roughly = cached + free
    3. Free RAM
  2. Total RAM = A + B + C
  3. in Windows Server 2012:
    1. In Use RAM
    2. Available RAM
    3. Committed RAM
    4. Cached RAM

You shouldn’t necessarily be overly concerned about Free RAM hovering on or around zero, unless of course this is a new thing that is not normal for your system. When Available RAM continuously hovers between 0 and 100 MB, you have a problem that you need to fix.  Unless, of course, this is normal for your system and you are so awesome you can walk right up to that line and flirt with it as though she were superman’s wife and you have a bucket of kryptonite in your left hand and a crowbar in your right.

Without exception, when I have seen Available RAM at 0 and had a sluggish and non-responsive system on my hands, this was the problem.  Most of the time, my first action is to lower max memory in SQL first, and ask questions later.  Why?  Because I need a system that is responding to my actions in order to figure out what happened.  SQL server is usually very responsive to this action, and the system becomes almost immediately useable. Then, with the client and other support representatives from kCura, we would go on our scavenger hunt and figure out what had happened.  It can be anything from a bloated file-cache – a user dragged and dropped a large file onto the desktop of the SQL server, which over time erodes away at the RAM for the SQL Server, or maybe they had web browsers open and were generally treating the SQL server as though it were a desktop machine.

By the way, its a good security AND performance practice to prevent Internet access from your SQL server.  Furthermore, you should not have SSMS even installed.  If you can run Core,so much the better and congratulations to you for this – it is a great practice.

Conclusively – keeping an eye on your max memory setting, and knowing who and what are stealing precious RAM from SQL, is a top ten skill in the DBA toolkit. Knowing whether or not you should even set Min memory and if you should use lock pages in memory – that is an advanced topic and I have my own opinions about it, some of which are unpopular and if you can find me after my SQL Saturday session, I will happily discuss it with you.

Stuck in Recovery (3 of 10) 

Hands-down, this is probably one of my all-time favorites. All Relativity databases (with a couple of exceptions) ship, by default, in FULL recovery model. Sometimes it becomes necessary to restart SQL (I know, right?) and if we don’t have the healthiest of log files, restarting can become a most painful process.  You can read about log file maintenance here (link to relativity tech support document about Relativity log file maintenance) which is a document we created and has been peer reviewed by Brent Ozar and Mike Walsh.

If you are not doing anything special to maintain your log files, you are missing out on some of the sweetest performance benefits you can have.  Aside from decreasing restart times, and not having your database get stuck in recovery for 2 days (yes, this can happen), you can boost performance by removing LOG WRITE WAITS – completely.  Yes.  You can COMPLETELY remove LOG WRITE WAITS.  This is not a WAIT type that should ever be in even your top 100 WAITS, unless the WAITS were created by some maintenance that you did. (you of course can’t COMPLETELY remove them.)

This link takes you to a SQL script that is useful in counting how many VLFs you have.

And here is a post by Jonathan Kehayias  that explains log management more thoroughly.  If you plan to grow your log file organically – that is, just set it at some size and then forget about it, and if you expect the log file to grow, then you should consider to set the growth size to be big, such as 512 MB or 1 GB. Consider that, if this becomes a problem and you begin to see LOG WRITE WAITS, then you should make the decision to attempt to predict how big your largest read/write table with a non-sequential primary clustered index will become, and you should force-grow your log file somewhat larger than that.

“Why Read/Write, non-sequential?” you ask?  While it is true that there are a few outlier situations where a linear key, clustered index on a large table may need to be rebuilt, mainly the log file space operations that will consume the most space are as follows (not in any necessary order):

  1. non-sequential, clustered index rebuild – it’s so fragmented that only a complete rebuild of the index will do any good.
  2. non-clustered index re-orbs and rebuilds

There are probably other things that will chew up a lot of log space, but as a good DBA you should know what they are and be prepared for when they will happen.  You must be able to anticipate these things. The main reason for concern in Relativity about the type of table that may be rebuilt lies in our auditing structure.  Every single user action, every data load, every mass edit, is logged in our audit record table.  While you may  (in Relativity) have a 300 GB Document table, you may also have a 3 terabyte auditRecord_PrimaryPartition table.  This Document table, which is read/write, may have some very large non-clustered indexes that need to be rebuilt, and has had on occasion needed to be rebuilt due to the addition of computed column indexes.  The auditing table, conversely, has realtivitley small, non-clustered indexes, and its primary key is monotonically increasing and is of little concern.  This table will never need to be rebuilt, and if it does, then you will know about it.  It is not something that will happen in the middle of the day or over a weekend in a maintenance window.  If the non-clusred index on this table has actually exceeded the size of the Document table, then that would be an interesting thing, and we would like to see this and of course you would then adjust your behavior accordingly.

So, these are just the lessons we have learned. and they may or may not be applicable to all databases everywhere.  SIFI is not in the cards for any of us in this room, we must learn from experience and make wise decisions based on past performance.  This is the best we can do, and while I can share with you my experience, and the experiences of kCura, I can not give you SIFI guidelines.  It is the sign of an immature DBA that would even ask for such a thing.

“hey!…that’s my turbo button!?” (4 of 10)

If you call into your software support representative, and you have turned on Priority Boost on your SQL server, you may expect to be asked, quite firmly, to remove that setting.  If you refuse, you may find that your tech support company may refuse to offer you any further assistance regarding your issue until such time as this box has been unchecked and SQL has been restarted. This feature marks SQL Server threads with the highest priority.  No other processes, including those of Windows, will get a higher priority.  Combine this with someone setting MAXDOP to 40 on a 40 core server, and you have just ghosted your SQL Server. Flame.  Out.

Paging (5 of 10)

This relates somewhat to flameout 2 of this series. Sometimes, if SQL comes under memory pressure, it will choose to page memory out to disk.  Once this memory is paged out to disk, we have seen that it does not return to VRAM, even after when whatever circumstances that caused the paging have been alleviated. If it is an option, I recommend you restart SQL.  Of course you should know your VLF situation before you do that, because if you restart a SQL server and get stuck in pending RECOVERY for 2 days, you will have a very upset stakeholder on your hands.  This is one of two circumstances where I will recommend a SQL server Restart. The other one is if a SAN gets unplugged in the night.  SQL will show the database as being there, but it won’t be.

Another way to get the cache to clear out might be to runn

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

but I have found that these two commands do not reliably dump ALL buffers.  You may still have something left in the paged buffer pool that will create significant impedance.  There are probably some additional commands I could enter here.

Cache out (unsolved mysteries) (6 of 10)

ON three separate occurrences, so yes this is a more esoteric top ten item, we have come across a situation where the oldest item in the cache was about 10 minutes old, rolling forward. This means that as we sat and watched, even after 30 minutes of observation, we saw no significant aging.  All the settings were to best practices – plus there is no one setting that could cause ALL items in the cache to never age.

Here is where a DBA that faithfully follows a strict protocol of change control must be lauded.  Were it not for the DBA, able to tell us the ONLY change that had been made in recent memory – we would still be scratching our heads. I had been on site with this client just three weeks prior to this follow-up visit, and the problem of the vanishing cache did not exist then.

He had set MIN Memory = MAX Memory.  We all doubted that this could be the root cause (we still do) but when he set Min Memory to 0, the problem stopped.  To this day, when we talk about it, we refuse to believe that this setting CA– USED this, but since then I have seen it on two other servers. There may be some other environmental factor at play here, something to do with clustering or maybe to do with another configuration setting, we don’t know.  For now, if you see this, change the min memory, which you probably should not have set at all on a stand-alone instance of SQL where it is the only application.  even were this not the case, there is no reason to set them exactly the same, there is no detrimental effect to leaving them a few MB or GB apart.

 

Wait, did you just say NTFS? (7 of 10)

One time I received a ticket that involved optimization. Our client complaint was slow queries.  She was right, the query was slow, and it was a query that I already knew about, and I knew what index to apply to get it to run very fast, sub 100ms.  After optimization, though, the query would not go faster than 1800ms.  “This is still too slow”.  I sniffed around the system, ran some checks looked at WAITS, and it become pretty apparent to me that we had a problem with storage.  I let her know, and a week later we were on a call with one of kCura’s storage and virtualization experts, the client, and me, the tuner.  I began asking some questions and as the client’s storage expert answered my usual questions, a text message popped up.  Our storage expert said “He just said NFS.”

“What??” I replied via text. “I thought he said NTFS.  Who puts their SQL on NFS?”

Next, I asked, “Did you just say ‘NTFS’?” and the client replied, “No, I said ‘NFS’”

“Are you running SMB 3.0?”

“No.  Is this a problem?” he asked.

“Yes, it’s safe to say that we do not need to have any further discussions.  This is our root cause.”

Several weeks later, the client moved on to an SQL server that has dedicated storage.

 

It worked fine yesterday 8 of 10 (top 10)

Inevitably, you get the call, “This query worked fine yesterday, today it is not working at all and won’t finish.”  Here are ten possible reasons.  Can you think of anymore/

  1. The root query fundamentally changed
  2. An index was dropped
  3. The size of the data being queried increased 50x overnight
  4. Storage was swapped out
  5. A setting was changed (Max memory set wrong, MAXDOP set to 40/40 cores, etc.)
  6. The user is confused – their query is brand new and has not been run before today (plan cache)
  7. The server was restarted, the cache is not warm
  8. There was a failover (we’re not home alone anymore)
  9. Maintenance plans have been failing. FOR WEEKS.
  10. We are fresh out of magic pixie dust

Blue Screen (9 of 10)

I have very vivid memories of this issue. I started at kCura in 2009 – up until this point, i had been a paid consultant, where the affected users I served were either employees of the billed party or were a partner in a law firm that would be paying my bill.  Hands down – across the board, they would rather buy a new server than have me spend 20 – 30 hours (or more) digging into a root cause.  kCura has a different approach – we don’t bill our clients by the hour to help them solve their technical challenges.  We just help, and we keep helping until we either have an answer, or everyone agrees that an answer is not possible.

We received a call one afternoon of a server that had become unresponsive.  During the course of the call, the SQL server blue screened.  What followed was a very demanding client, insisting that we explain how and why this happened – they wanted to know what caused it.  Review of the error log showed that SQL server in fact had shut down and we had a windows dump file that we were able to send to Microsoft for analysis.  In the error log, we had a portion of a query.

Relativity allows users to create and run custom search queries against sometimes large data sets.  You are probably all aware of Google’s limitation of some 4o or so words in a search. At the time Relativity had no such limit – and the search query that SQL had dropped into the error log was very large. So large, in fact, that it had been truncated when it was inserted into the error.

This query became central to our investigation.  Two days later, I had completed the design and creation of a query that would search through all workspaces and pull back any saved search that had a size greater than 10,000 characters.  It returned a large number of searches.  I increased to 20k.  Then 100k.  Then 400k.  Two searches came back – and one of them was an exact match of our suspect.  With this information, we were then able to figure out who had run the query, and when it had been run.  We were able to train  the user, and we also modified our application so as to allow a configurable upper limit on search submissions.

By listening to our clients, and by digging deep, we were able to make our application better, and improve the customer experience.  This query I built to analyze searches across all workspaces was the beginning of a view and recurring saved search analyses framework, and its development would span 5 years and result in the complete automation of search analysis.  This one thing that would normally have been swept under the rug in most corporate cultures became instrumental to development that would take place 5 years later.

 

 

Warning: Disk IO took longer than 15 seconds (10 of 10)

One thing is certain – where there is smoke, there is fire.  This error message in your log reinforces the lesson “Check the error log first.”  It is your first  go-to when troubleshooting a SQL problem. Much of what can go wrong in SQL will be reported here, and if SQL Server can’t talk to it’s files for more than 15 seconds, it logs it in the error log.  We had the chance to dive deep into this error on a large, VCE converged system that had been experiencing extensive issues.  Ultimately, we felt the fault lay in the fast cache, that it was stalling and not responding.  During troubleshooting, the client suggested that the disk IO errors were not related, and that we should seek the root cause elsewhere.  This prompted us, in tandem with the client’s assistance, to develop a script that could detect IO taking longer than 1 second.  This involved a little custom scripting and analysis of SQL’s ring buffer.

After the script was completed, and we ran it on a 2 second wait interval, we learned that not only were there an occasional I/O wait that took longer than 15 seconds, but there were 1 second long burps happening almost constantly.

This script that ran this check was not difficult to write, but it has fallen to the doom of too many tabs opened in Management Studio and it wasn’t saved.  According to Mike Walsh (Linchpin DBA) this information can also be teased from extended events.

 

Punch-cards like birding

May 4, 2015

Today, we had Rose Breasted Grosbeaks at our feeders.  This was super emotional for me.  Why?

In the spring of 2009, or maybe 2010, Ethan’s interest in birds took an aggressive twist. Until then, I somewhat thought it a quirk of his childhood. Something he would outgrow – a phase, or a passing interest. One day, when I came home, he was going on about having seen a Rose-breasted Grosbeak at the feeder. He described it to me, and it was a bird I had never seen.

Rose -breasted Grosbeak

I wrote it off as the ramblings of a 7 year old child with an active imagination.  After all, prior to us having bought him an actual bird book published after 1940, (about thirty days before this), everything was a Chipping Sparrow. Besides, I was pretty certain at the time that I would only see chickadees, cardinals, and robins in my yard along with the yucky black birds that make a lot of noise.

The very next day, sitting in my family room, a bizarre looking bird landed on the feeder. I grabbed my camera (5D with a 200mm lens) and took some very grainy pictures through the window.

Interestingly, I had just finished reading Outliers – I’d learned how Bill Gates became a great programmer because he was able to compile and debug code on the fly (as opposed to the tedium of punch cards where a single mistake can take weeks to debug).  I had an idea – I would photograph all the crazy things that Ethan sees, and give him the same kind of debugging feedback. The old way of birding, where the birder sees the bird, then either right there of later on starts flipping through pages in a book, and trying to work from a faulty memory, reminded me of how programmers used to have to work.  They would make a bunch pf punch cards, put them in a stack (don’t drop them!), feed them through a computer, wait for it to compile, and then debug.  With the advent of on the fly compiling, you could program and compile much more quickly. I would do this for my son, he would see the bird (write the code), identify it (compile it), and then check it against the photo (debug).

I became hooked. I already loved photography,  I am fortunate to work for a great, successful income that allows me to travel and purchase equipment, Ethan loved birds, and so I became a review tool he used for verification (birds in flight, distance shots, etc), and I became his student. Through him, I’ve learned to become a better parent and a better photographer. I’ve learned that when you, the parent, fuel the fire that is in your child, when you fan the flames, they become a conflagration, a burning inferno of passion. All you have to do is create opportunity, and get them whatever they need to learn, and.then, the child becomes amazing. If you can do what they love with them, all the better!

Today, after a 6 year absence, the Grosbeak came back (Ok, maybe not the same one) and it brought three friends. We had FOUR Rose-breasted Grosbeaks on our feeder today. Yesterday, the find of the day was in Urbana at Busey woods when Ethan located a rare White-eyed Vireo who just happened to very much want his photo taken.  The RBG and WEV are in this set on Flickr: https://www.flickr.com/…/44211019@N…/sets/72157651937850019/

Personally, I am thinking that the White-eyed Vireo came to see us, because he hung around us and followed us for about 100 feet before flitting away up into the treetops. Maybe he’s friends with the GWW – I’m telling you, they talk about us behind our backs …. 😉
Scott Ellis (Ethan’s proud and slightly insane Dad and personal photographer)

What He Thinks (WHT), What She Thinks (WST)- The Elevator Extrapolation

Scenario:  A man exiting an elevator in an office building and a beautiful woman, entering the same elevator, silently mouths the word “hi” and smiles at him as they pass each other.


 

Part 1: The walk-by

WHT: She’s totally into me. Do I know her?  

WST: Isn’t that John from accounting?   I wonder if he remembers that time he sent my paycheck to the wrong address and it took me three weeks to get paid?  I was so pissed at him.  Did he grow a beard?  It’s kind of hot.  Should I say Hi?  I hate him.

*They walk past each other*

WST : Oh crap, this isn’t John at all.  Why oh why don’t I get my eyes checked? Oh crap, it’s that creep, Steve from IT….but I have to say hi, because he is in IT and you have to be nice to IT people so they will come running when you have a computer problem….maybe I should throw a little flirt in?  Just to…wait, no, then he will cyber-stalk me. Oh crap, my lips already started to move…oh SHIT!, why am I smiling! what have I done??  I hope he doesn’t recognize me…I hope he doesn’t recognize me…oh god, I can  feel his eyes drilling holes into my ass…gotta push the button. *turns around* yep, there it is.  Take a nice long look. *smiles awkwardly at him*

WHT: She’s totally into me. 

Elapsed time:1 second


Part 2: As the doors close

WST: Creep. Thank God these elevator doors will close before he can take that total miscue and jump back in… Think happy thoughts…think happy thoughts…Kittens!” *smiles*  Now let’s see, after work today I need to stop at the grocery store and pick up lettuce, tomato, green onions, hamburger…gosh, will these doors ever close?  Is he really still staring at me??  Oh, and I can’t forget to call mom, and hmm, I wonder if my blue shoes will go with that black dress I bought yesterday *contemplates 311 factorial arrangement of clothes and shoes

*doors close*

Oh thank god, they finally closed!  These doors need a stalker button that closes the doors instantly….

Elapsed time: 3 seconds


 

Part 3, 20 minutes later:  After thoughts.

WHT: She was totally into me.  I should have made some excuse and jumped back in the elevator….

WST: What should I have for lunch? 

 

What is eeCardinal?

eeCardinal is the beta version of a photography app that integrates with flickr. It’s purpose is to enable people to review photographs in an intuitive and user friendly manner.  No further information about this app is available at this time.  It is described in some great detail in the book

Mobile Devices: Tools and Technologies – CRC Press Book

If you would like to know more, you must purchase the book and return here.  Further information will not be available until the book is released. If you are a test flight user, you can reach me via email with any questions.

eeCardinal is built on Glimpse360 technology, which is also produced by Scorellis.  eeCardinal may not be the final name of the product.

eeCardinal test flight support

For support on eeCardinal, the namespace on flickr, or on Glimpse360 vector technology, contact scorellis through the self-same address at gmail.com.

Solicitations or inquiries regarding the app development, jobs, photography, etc, are not being accepted at this time.

 

Thurple Thursday Morning

This morning I awakened and as I got dressed, I thought to myself,”Iit’s not waffle Wednesday, why am I putting on waffle? Too bad there’s not a material called ‘thurple’.

“Hey, that would be a good name for my app. ” Then I googled it and found very few results. This one stood out, because my app is about creating an identification space.

http://www.serc.si.edu/education/programs/java/activities/purple.aspx.

Then I said the word out loud. Then I decided it didn’t make sense. The search for a catch name continues. For now it is still eeCardinal.

Ten Commandments of MS SQL Database Administration

  1. Thou shall use no other databases before MS SQL.
  2. Thou shall not attempt to print thy database.
  3. Thou shall not assume thy databases are not corrupt.
  4. Thou shall perform DBCC and backups on the 7th day.
  5. Honor the query plan and statistics and keep it wholly in your mind at all times.
  6. Thou Shall not DELETE without a WHERE clause.
  7. Thou shall not commit thy transaction without isolation.
  8. Thou shall not unnecessarily transact with read isolation level uncommitted.
  9. Thou shall not say you have database security when you don’t know.
  10. Thou shall not transgress upon production bandwidth with maintenance task operations.

tile – the World’s Best Lost and FIND  

Scott Ellis is a technologist,  an author, blogger, kalimbist, micro-blogger,  and a photographer, in no particular order.  Flawlessly forgetful, he is an avid tile user.  For a day job, Scott works as the infrastructure architect for the  e-discovery software company, kCura.

by Scott R. Ellis

I’ve lamented for years that I can’t just type in : Find: sneakers.shoes into my computer and have it tell me where my sneakers are.  One time, after I had been coding for nearly 20 hours, I actually tried to grab my coffee cup with the mouse arrow.  Yeah, right?

I lose my stuff in my house.  Every day, I spend a minute here, a minute there, just looking for things like my keys. Mostly, it’s my shoes. Sometimes, my keys are in my shoes.  In the daily battle between my left brain and right brain, organization is typically the first victim

Now, along comes tile. I’ve been seeing a lot of buzz, and with my inevitable forgetfulness and curiosity, I found their offering intriguing.  I decided to make a purchase, and give it a chance.

tile is a nifty little tag that I stick on just about anything to help me keep track of stuff.  It’s about 1 inch square by a quarter inch thick and I have them on everything from my kalimba to my camera to my shoes.  I even have one in each car.  I have placed another order for 8 more. The truth of the matter is this: If it were feasible, I would integrate this platform into everything.

Here are just a few, everyday uses for tile, most of which I will have cause to use:

  • How to find your bicycle in a row of 100 other bicycles.
  • locate your coat in a coat room or in a pile of coats at a party.
  • Find your significant other at a party (hopefully not in the coat room…)
  • Find your car in a huge parking lot.
  • How to find your kids’ missing gloves in the school lost and found.   (all teachers and school administrators should have theTileApp!!!)
  • The kids (Have you got them all? you lovebird you!)
  • The luggage from the hotel room
  • The luggage on the carousel at the airport
  • Finding your TAPE MEASURE!
  • Dog training clicker

ok, I’ll stop on that last one.  You know and I know that I could go on all day long like this — 10,000 uses for tiles that have almost nothing to do with someone finding your lost [insert your item here] and saving the day.

What’s was clear to me, the first time I ever saw tile, is that tile is a search engine for the physical world. At long last, I no longer have to keep track of ANYTHING.  Well, anything that I can affix a tile to…

“WAIT!”  You’re shouting (please stop shouting at me).  you say, “I don’t want to stick this little white tag on everything I own! My shoes?  Are you KIDDING Me??!”  (really, stop yelling at me!).

Ok, check out this video.  Acrylic paint is some seriously amazing stuff.  It dries really really fast.  The label on the paint I used cites that it cures in about 21 days.  “Cures” is just another way of saying, “It is really, really dry now.  Too late to change your mind!” Bonus: you can mix your own colors, too, and it’s super easy.

https://www.youtube.com/watch?v=Tej38HVd9rM

I just painted mine yesterday, and I plan to wait 21 days (I know! more waiting, right? Patience, grasshopper).  Once the paint has cured, I’ll put it on my key ring.  We’ll have an answer in a couple more months how it is going.  How long will it last?

Speaking of the future, what’s in store for the future of tile?  I have no idea what they will actually do, but here are some suggestions (I hope they read this blog!!!!):

  • More clever designs, preferably by me!
  • Smaller size, different shapes.  Embed tile in everything!!
  • tile way stations – put little micro towers in busy locations, like train stations and airports that will only store the location of LOST items that walk by.
  • tile relay points – I want to have a master tile that I can stick on a wall in every room in my house that has more juice, and will communicate with the little guys.
  • Triangulation – two or more tiles and a cell phone should be able to pinpoint the precise location of a tile.  .
  • v.futuristic future : you don’t even think about it anymore.  Everything comes pre-tiled and registered to you when you buy it at the store.  It just shows up in your app. Every surface in your house is a charging pad – dropping your clothes on the floor recharges its integrated tile.

Clearly,. my impeccable forgetfulness is because I daydream too much. Yes, building a future like this will take A LOT of hard work.  Startups are NOT easy and they rely heavily on user feedback.  I am more than ok with this.  Together, we will grow, and build a future of tile ubiquity. Grab your tile, grab your inner child, and paint.  You can’t mess this up – it dries really fast and you just paint over your mistakes!

Measuring Efficiency of DBCC CheckDB with Computed Columns

by Scott Ellis and Erik Darling

Published on: Nov 28, 2014 @ 11:16

VERSION of Microsoft SQL : Microsoft SQL Server 2008 R2 (SP3) – 10.50.6000.34

Background

Recently we [a software vendor] added some persisted, computed columns to our software database, which is MS SQL 2008 and 2012.  We (my friend Erik and I – this is not an officially sanctioned test) wanted to know how much longer DBCC checkDB tests would take, and if we could put a percentage on it.  See the last paragraph for the answer if you want to skip to the good part and avoid reading all these accursed words.

Here are the basics of the computed columns.  An index is subsequently created on each computed column, each index contains 3 keys. The computed column is the leading key in all three of them.

Here is the computation in the table definition on each computed column:

 [redactedComputedColumnName]  AS (CONVERT([binary](20),hashbytes('SHA1',case when isnull([hashedField],'')='' THEN CONVERT([nvarchar](30),[ID],0) ELSE upper('S'+rtrim([hashedField])) end),0)) PERSISTED NOT NULL

The Rationale

Why did we want to do this test: This test seeks to identify the percent difference in duration of a DBCC check with and without computed columns.  We sought to determine if we could create a percentage difference based on the size of the single, very large table that houses the computed columns, and then apply that as a multiplier across thousands of databases.

Previously, these columns were populated by a trigger, so adding computed columns has increased the duration of DBCC.  We did the right thing and removed the trigger because it was a never-ending source of trauma when inserting massive amounts of data, which is a common occurrence in this software.

The Experiment

To begin, we restored a 5 GB database* that is representative of the types of databases about which many people will be curious. We ran this test against a database attached to SQL on an 8 core, 50 GB RAM SQL Server 2008 R2.  The table we examined is 3.398 GB.  The database server runs on an Intel Xeon E5 2.6 GHz virtual machine. Here is the restore code we used, if you are into this sort of thing:


 RESTORE DATABASE [DBCC_TEST5GB]
 FROM DISK = '\\Backups_SQL\RELO001\ERIK_COMP_TEST\FULL\v.NoComputedCols.bak'
 WITH BUFFERCOUNT = 500, MAXTRANSFERSIZE = 4194304, STATS
 
Restore Time: 670553 pages in 115.857 seconds (45.216 MB/sec)
Size of BAK file: 1.32GB (compressed backup)

We used BUFFERCOUNT of 500 and a MAXTRANSFERSIZE of 4 MB because we wanted to beat the shit out of our IO. Use this code (really, anything you find here) at your own peril. The word “STATS” allows you to see how far long the restore has progressed.

The we set:

STATISTICS TIME ON
We wanted to know how long it would take, both real world and CPU time.
For monitoring, we opened a new window to run sp_whoisactive to see what’s happening.  We even piped the output of Adam Machanic’s sp_whoIsActive script to a table for further evaluation and verification as needed.
Then we started the DBCC CheckDB. The following Traceflags were active:
--DBCC TRACEON (2549, -1)
--DBCC TRACEON (2562, -1)
--DBCC TRACEON (1118, -1)

You can look these up at:
http://blogs.msdn.com/b/saponsqlserver/archive/2011/12/22/faster-dbcc-checkdb-released-in-sql-2008-r2-sp1-traceflag-2562-amp-2549.aspx  and  http://sqlperformance.com/2012/11/io-subsystem/minimize-impact-of-checkdb

Baseline 1: DBCC against a 5 GB, no Computed Columns Database

We need to know how long a DBCC check will take on a “normal” database. To get to a version of the software that has computed columns, we upgraded from an earlier version.  First we did the command against a previous version of the database:

 /* DBCC command for V.noComputedCols database */
 /* DBCC command for 7.5 database */
 DBCC CHECKDB ([DBCC_TEST5GB]) WITH NO_INFOMSGS, ALL_ERRORMSGS
/* Run DBCC check without indexes */
 Duration: 103985 ms.
 CPU time: 71663 ms.

Baseline 2: DBCC against a 5 GB,  database with computed columns disabled.

The previous database was upgraded to a version with computed columns. To establish foundation, we took a baseline with Computed columns disabled.  This was to look for possible differences introduced by the upgrade.

/* Disable indexes on computed columns */
ALTER INDEX [IX_1003669_RI] ON [EDDSDBO].[Document] DISABLE
ALTER INDEX [IX_1003671_RI] ON [EDDSDBO].[Document] DISABLE
ALTER INDEX [IX_1035456_RI] ON [EDDSDBO].[Document] DISABLE


Ran a DBCC command:

 /* DBCC command for disabled Computed Columns database */
 DBCC CHECKDB ([DBCC_TEST5GB_CC]) WITH NO_INFOMSGS, ALL_ERRORMSGS
 /* Run DBCC check without indexes */

Results:

Duration: 33003 ms.
CPU time: 70056 ms.

Experiment 1: DBCC with Computed columns on against a 5 GB database

To get the database back to where it needs to be, we rebuild the computed column indexes.

/* Rebuild indexes */
ALTER INDEX [IX_redacted_1] ON [EDDSDBO].[Document] REBUILD
ALTER INDEX [IX_redacted_2] ON [EDDSDBO].[Document] REBUILD
ALTER INDEX [IX_redacted_3] ON [EDDSDBO].[Document] REBUILD

Then run the DBCC

 /* DBCC command for enabled Computed Columns database */
 DBCC CHECKDB ([DBCC_TEST5GB_CC]) WITH NO_INFOMSGS, ALL_ERRORMSGS/* Run DBCC check without indexes */
Duration: 147264 ms.
CPU time: 81358 ms.

Baseline 3 : 430 GB database with disabled computed column indexes.

Next, to establish our multiplier, we ran against a 428.48 GB Database.  Having already established that the CheckDB time ran with negligible differences, we ran it again with the same three computed columns, but with them disabled.

The table with the computed columns sported a hefty 269.418 gigabytes.

/* Disable indexes on computed columns */
ALTER INDEX [IX_redacted_1] ON [redacted].[EDDSDBO].[Document] DISABLE
ALTER INDEX [IX_redacted_2] ON [redacted].[EDDSDBO].[Document] DISABLE
ALTER INDEX [IX_redacted_3] ON [redacted].[EDDSDBO].[Document] DISABLE
/* Run DBCC */
DBCC CHECKDB ([DBCC_TEST430GB]) WITH NO_INFOMSGS, ALL_ERRORMSGS
/* Indexes Disabled

SQL Server Execution Times:

Duration: 9705874 ms.
CPU Time: 4391135 ms.
This (9705874 ms) is 2.69 hours (161.7 minutes)

 /* Rebuild indexes */
ALTER INDEX [IX_redacted_1] ON [EDDSDBO].[Document] REBUILD WITH (ONLINE = ON)
ALTER INDEX [IX_redacted_2] ON [EDDSDBO].[Document] REBUILD WITH (ONLINE = ON)
ALTER INDEX [IX_redacted_3] ON [EDDSDBO].[Document] REBUILD WITH (ONLINE = ON)
Index Rebuild time: 00:01:59.003  (about 2 minutes)

Experiment 2: Run CheckDB on large, production-grade data set with enabled computed column indexes

/* Run DBCC */
/* DBCC command for 8.2 database */
DBCC CHECKDB ([DBCC_TEST430GB]) WITH NO_INFOMSGS, ALL_ERRORMSGS

SQL Server Execution Times:

Duration: 10846102 ms.
CPU time: 4689156 ms.

Conclusions

To fully understand what is happening during a DBCC CheckDB of database computed columns, check out this blog post from Paul Randall.

Paul’s post does indicate a much greater performance hit than what we saw. This could be due to some confluence of the calculation involved, the number of them, the size of the data, the index, trace flags in use, and the underlying hardware.  Paul suggests that dropping and re-enabling the indexes is not a “palatable solution.” I submitted an inquiry to him about this, asking why it is unpalatable (which quite literally means not very tasty), and he responded “Because someone might have indexes that take a long time to rebuild, might not have online, might be enforcing a constraint…”

This got me to thinking that, since with this particular database software, there is no constraint, that many people running this product have Enterprise, and many administrators have offloaded the DBCC process to an extraneous SQL server, disabling the indexes may be an option.  Certainly, in an offloaded situation, disabling and leaving them disabled may leave only a mildly unpalatable aftertaste.

For those of you who are doing consistency checking in production, the only real issue that you can experience is if users actually attempt to access and use the database while the check is in progress and the indexes are disabled. If your column has great potential to be used a lot, then we don’t recommend a production disabling of this index.

Also realize that if this index is corrupted in your production system, you run the risk of a user running across it and then you have to rebuild it during production hours. The risk is low, but it is there.

Many infrastructure admins that run corruption checks on very large** databases run them in an offloaded way.  They restore the databases to cheaper hardware and let the checks take a day or two to run.  For those people, disabling these indexes is a perfectly viable solution.  If even the remote capacity exists that someone may hit those indexes WHILE they are dropped and WHILE you are doing a DBCC check DON’T DO IT. This not a viable option for a production system and if you are caught doing it by this software infrastructure architect, I will scold you 🙂

Ultimately, whether you have Enterprise or not is irrelevant – you do far more damage to query capabilities for the one or two hours that the indexes are offline than you do for the two minutes it takes to rebuild them.   Moral of the story :

If you are offloading DBCC and want it to go faster, automate a computed column index disable.  If not, if you are performing DBCC in prod, A) Stop doing that B) leave these indexes enabled.

The percent performance improvement, for these particular databases,  seems to be in the neighborhood of XX%- YY%.

 

*No, you can not have, ever, no matter how often you ask me for it

** By very large I mean any size database that causes you stress!

 

 

Missing African Grey Parrot – Evergreen Park Illinois

parrotHey – as a Chicago Bird Collision Monitor volunteer, I received this message today – please RT and post to FB too!!! It is cold out, and this bird will be scared and alone !

Missing African Grey Parrot
I’m forwarding this kind of heart-breaking message because CBCM might hear something about this bird that escaped.  The person who had it as a pet is a U of C grad student.  The fact he has put up a huge reward suggests it really meant something to him.
–Dave

———- Forwarded message

Hi all,

A good friend of Ben and mine lost his African Grey parrot (see below). If you get a chance could you pass on the information to any local birding societies, collision monitors, etc. that you have contact information for? The parrot was like a child to him and his partner and the loss of the parrot has been utterly tragic for them.
Many thanks,
Aaron
Begin forwarded message:
I apologize for sending this to the entire listserve, but I really could use your help.  Yesterday I returned from SVP in Berlin to the news that my 3-year old, African Grey parrot, named Earl, had escaped from the woman who was boarding him on November 1st (eleven days ago).  She had been lying to me the entire time that he was fine, “so that it wouldn’t ruin my trip.”  I am distraught, and I am desperate, because he could be anywhere on the south-side of Chicago by now (he is fully flighted).  He was last seen in Evergreen Park, IL.  My only hope is that he flew to somebody for help, before a hawk or the cold got him.
Please, if there is anyone you know that is a bird-watcher or who lives in the vicinity of Evergreen Park, please ask them to keep an eye open for Earl.  If you know anybody in any ornithological societies as well, please let them know.  Any sightings of a strange grey bird, with a bright red tail, a little bit larger in size than a pigeon, is a good sighting.  I can’t do this alone and it all feels so hopeless right now.
My sincerest gratitude,
Justin