Thoughts on interviewing technical candidates and running technical tests

(We’re hiring – https://www.basefarm.com/sv/jobb/Lediga-tjanster-Sverige/ Drop us a line if you’d like to come and try our technical test!)

We’ve been expanding quite a lot recently, which is a good thing of course, but also means that you often need to spend a fair amount of time doing recruitment and interviews. I find this side of the business extremely interesting, and it’s something I’ve always enjoyed being involved in throughout my career. There are many schools of thought surrounding this topic, but here are some points which I like to cover when I recruit technical people for support and operational roles.

I always try to do a technical test. This may sound like a given, but in my experience it’s not always the case. Technical tests mean different things to different people, but I like to do technical tests where candidates get the opportunity to demonstrate in a real situation that they know how to troubleshoot things. I’m only in the business of recruiting operational and support staff and I think that this is a very specific field but one that is hard to test people on specific skills for. Therefore I like to watch someone fix something that is broken. I have all my candidates for the role to do the same technical test as well, so that I can compare and contrast.

Normally I tend to do my interviews like this:

1. CV reading – if it’s more than 2 pages I get bored and you’re unlikely to get through. This is a huge topic all of its own, but I’ll just say this: listing out every programming language, tool, script engine and piece of software you have ever been in the same room as does not impress me.

2. first interview – 1 hour usually and can be done by phone if appropriate. Are you generally appropriate for the role?

3. second interview – technical test (see more below)

4. third interview – meet managers and other team members (this could actually be more than one meeting)

5. make an offer

I am extremely passionate about recruiting the right people first time, and there are really no short cuts in this process. You need to be prepared to invest the time up front or you’ll be paying it back for years with the wrong candidate on board. (I learnt this the painful way when having candidates imposed on me much earlier in my career in the early 90’s, and I resolved to do all I could to avoid this situation as soon as I reached a level of seniority where I was allowed to recruit for myself.)

There’s enough literature out there about the interviewing process as a whole, but here are some more thoughts on the technical test side of things.

I like my technical tests to follow this theme:

a) simple theory closed questions about the main technical topics – these questions should have fixed answers. I think of these as textbook questions. (which to me means that you can parrot learn them so I don’t actually hold the results in that high a regard) A typical example might be:

What does ACID mean in terms of RDBMS?

b) more complex open questions about techniques or principals involved in the job. This should give the candidate the opportunity to give a lengthy full answer demonstrating their full knowledge of the topic. For example I usually ask about 10 questions of the following nature:

A customer complains that their website is running slow. Explain how you would go about troubleshooting this problem.

c) a practical test  – sit the candidate down in front of a computer and get them to attempt to do something which replicates what their day to day job might be. This can be painful at first as you need to spend time on creating a reproducible scenario that you want repeated candidates to test on, but it tends to be time extremely well spent.

I was challenged this week by the thought of writing a new technical test for candidates to our windows team. The stuff I wrote is certainly not going to win any design awards as the web pages it’s based upon look like this

techtest

 

 

(don’t you wish all webpages were this clean Smile – all my web development looks like this by the way)

 

 

anyway I’m not interviewing for people writing HTML, CSS or equivalents, I’m looking for operational people who can sort out why when I click the above button my website doesn’t work (amongst many other things)

The roles we are looking to fill here currently require quite broad experience and you need be a “jack of all trades” within the windows world. At any time you might be asked to troubleshoot Windows OS, SQL Server, IIS, BizTalk or innumerable other components. We have deep specialists in all these areas of course who can help with the most serious escalated problems , but the TAM roles we are looking to fill at the moment are much broader and you often need a working knowledge of all the components your customers are running.

My point being that this was a slightly more difficult test to write than some I’d done before, as just how do you cover all of these areas? The answer is that you can’t really, so in the end I just tried to cover the basics and allow the candidate to prove that she knew her way around windows troubleshooting generally across some of the major components. I covered such topics as troubleshooting an IIS server which wouldn’t serve a page correctly in a .NET application, simple SQL Server administrative tasks, windows ACLs and so forth.

After all I’m looking for a candidate who displays the correct attitude to troubleshooting a problem, and who displays a logical and methodical approach to problems presented to them. Solving the actual problem within a short interview timescale is actually irrelevant (although obviously it doesn’t do any harm). The other good thing is that you get to watch people do the test and I find you can often infer a lot about someone’s overall approach to this situation, especially if get them to talk you through what they are thinking. It’s worth remembering that the test is a means to an end, and as such you could test someone on a completely separate piece of technology, just to see how they handle the troubleshooting process and being put on the spot as well.

So don’t be surprised if you come and interview for us and get given something to fix.

Graham

Employees at Basefarm are building the network for DreamHack

This weekend it is once again time for DreamHack – the world’s largest LAN-party. The event runs from 18th to 21st of June and for 72 hours gamers, coders and hackers from all over the world will gather in Jönköping, Sweden to compete in e-sports, creative competitions or just to hang out with likeminded people. Last winter the network hosted 12 757 unique MAC-addresses which once again meant a new Guiness world record. The network which is designed, implemented, operated and disassembled by the Dreamhack Network Crew, consists of 30 people: two of these are consultants at Basefarm.

The crew is represented by top IT-companies in Sweden and students from IT-universities and our skills and backgrounds are varying from operations, development, engineering to electricians. The core network is built using Cisco enterprise routers and switches and is connected to Internet by 2×10 Gbit connections from Telia. The access-layer consists of over 400 access-switches where the visitors connect their computers. The entire area is also covered with wi-fi and more than 20 internet streams cover the event live, including Swedish Public Service. The design process starts four months before the event and the reoccurring thing we discuss in our meetings is how to improve the network since last time.

This is in my mind the best equipment and foundation to build an enterprise network on. And by happy accident this is the exact same equipment we use in our datacenters at Basefarm. Building the network at Dreamhack is therefore like disassembling, analyzing and rebuilding our entire datacenter-core at Basefarm twice per year. After every event we sit down and analyze what we want to do better and what we want to learn for our next event.

For more information about the event, go to http://www.dreamhack.se

SQL Server TSM Restore appears to hang or takes much longer than expected

I’ve written previously about the dangers of VLF fragmentation, but the problems I’ve actually experienced before were always related to log based operations, i.e. recovery phase after a crash or database mirroring, but last week I saw an alternate issue where doing a full restore from IBM Tivoli Storage Manager (TSM).

At the start I can say the same thing that I always say when writing about this subject

Pre-grow your log files (and your data files) in advance. Avoid repeated online auto-growth if at all possible.

That said, here’s an example of what happens when you don’t, and more importantly how to fix it.

The Symptoms

You’re trying to do a full database restore via TSM (GUI or command line)

The restore takes much longer than you expected

The progress bar in TSM GUI says that the restore is 100% complete, and yet it is still running

If you run a script similar to this one, it also says 100% complete, but the restore still runs

SELECT r.session_id,r.command,CONVERT(NUMERIC(6,2),r.percent_complete)

AS [Percent Complete],CONVERT(VARCHAR(20),DATEADD(ms,r.estimated_completion_time,GetDate()),20) AS [ETA Completion Time],

CONVERT(NUMERIC(10,2),r.total_elapsed_time/1000.0/60.0) AS [Elapsed Min],

CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0) AS [ETA Min],

CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0/60.0) AS [ETA Hours],

CONVERT(VARCHAR(1000),(SELECT SUBSTRING(text,r.statement_start_offset/2,

CASE WHEN r.statement_end_offset = -1 THEN 1000 ELSE (r.statement_end_offset-r.statement_start_offset)/2 END)

FROM sys.dm_exec_sql_text(sql_handle)))

FROM sys.dm_exec_requests r WHERE command IN (‘RESTORE DATABASE’,’BACKUP DATABASE’)

It can be several hours (or even days) in this state.

In the error log all you see are rows indicating that the restore has started

Starting up database ‘xxx’.

The explanation

It’s most likely that your database log file has become logically fragmented into many virtual log files (VLF). Many means different things to different systems, but more than 1000 can definitely be a problem. In the problem I encountered last week it was 17000 which made a 25 minute restore take 3 hours longer than expected.

If you’re unfamiliar with the principals of VLF you should read the following:

Transaction Log Physical Architecture and also Factors That Can Delay Log Truncation.

If you want to check any of your critical database now to see whether you have this fragmentation, you can run the following:

DBCC LOGINFO (‘xxx’)

This is one of those commands that is officially undocumented, but that everyone actually uses! It’s been raised on the connect site to have it moved into a DMV in the future

https://connect.microsoft.com/SQLServer/feedback/details/322149/msft-mso-document-dbcc-loginfo-or-create-new-dmvs-to-view-vlf-info

I’ve run it innumerable times on production databases though (normally when fixing problems like this).

If the rows returned is greater than 1000 you might have a problem. To test whether you have a problem all you need to do is attempt to restore a backup of the database (you can do this in a test server) and see if you experience an unreasonable delay. If you do , then I would recommend you try to fix it.

The solution

You need to truncate and shrink the log back to a point where the fragmentation does not occur. The great people at SQL Skills have a very full article on this here:

http://www.sqlskills.com/blogs/kimberly/post/8-Steps-to-better-Transaction-Log-throughput.aspx

The (very easy) script to fix it is right at the end, but if you’re in this position and you’ve come this far, I’d recommend that you read the above article in full to understand why you got where you are.

Until the next time….

Graham

We’re hiring – https://www.basefarm.com/sv/jobb/Lediga-tjanster-Sverige/ Drop us a line if you’d like to come and work on interesting problems like this with us.