Back in January of 2016 we decided to completely transition out of our data centers and into the cloud (primarily Azure). We knew we had to do something - either make some big investments in new hardware or commit ourselves to migrating everything off our own gear. After looking at the costs and the opportunity to provide better capabilities in the cloud we committed to moving everything out.
On Sunday we finally shut down the last cluster of servers!
Would We do It Again?
So far our costs in Azure are about 12% higher than we had budgeted. Virtually all of this overrun we can attribute to SQL Azure - we’re having to run several customers in a Premium Elastic Pool which is notably more expensive than the standard pools. While we’ve done a series of optimizations to Loupe I think we’re at a point where we’re going to wait and see if the costs drop on SQL Azure before committing to doing what it’d take to get all of our users to fit into a tighter size. Our primary concern is the application performs well for everyone, we’ll just have to adjust our expectations on what that costs to host.
Key to controlling our costs is that we have an Azure Enterprise Agreement. This offers some pretty stellar discounts along with ways to cash flow your expenses so I highly recommend the program. One little benefit is you can have Dev/Test subscriptions in Azure which run at an even more reduced cost for development (basically you aren’t paying any “Windows tax”). We’re using that for automating our testing and CI at a lower price point than we had planned.
We’ve also used this opportunity to re-examine every system we run and shut down a few that just weren’t carrying their own weight. In the end, performance is good and I’m OK with our costs so yes - I would do it again. Still, I love buying hardware and I’m going to miss that. Perhaps I’ll have to start building my own PC’s again instead of buying off the rack.
Where’d Everything Go?
Like you, we have a ton of services large and small that it takes to run our company. Some of them have always been external - like our marketing email automation (presently Intercom with a side of Mail Chimp). Others we transitioned long ago (like support to FreshDesk). Still, when it came time to turning off the last of our systems we had a number of key systems still running on our iron.
This was the first item out the door - it’s been entirely running in Azure for some time. Recently we added a European data center so you can now have your Loupe data stay outside of the US if you like. We’ll be rolling that out more broadly in the new year with a separate announcement and more details on how it all works.
Our internal Loupe instances we use to monitor Loupe and VistaDB were hosted on dedicated VMs so we could run CI builds on them for our own dogfooding. We’ve mainstreamed this with our Azure instances using a new ability we’ve set up so we can have Cloud-hosted Loupe customers running on different builds (fast ring/slow ring style).
Our newer web sites are now built with Jekyll and hosted by CloudCannon. While this is about as non-Microsoft a platform as you can get we do find it works well overall for marketing sites and is easy to keep fast on its feet. Our customer sites (such as my.gibraltarsoftware.com) have moved into Azure and run alongside our Loupe Service.
We’ve hosted our own corporate email on an Exchange server we maintained since the start of the company. No more, we now live on Office 365. I’m glad to see our Exchange server go - it worked nearly flawlessly for us but it’s a big risk to our company and something I think no small shop should take on.
We deployed SendGrid two years ago when we started getting into Azure to simplify sending emails within Azure (instead of routing them through our Exchange server) and that remains unchanged.
We’ve had dedicated build VM’s for a long time. To provide Continuous Integration we’ve been using TeamCity and Visual Build Pro but as part of this move we wanted to be as cloud-native as we could. That meant TeamCity unfortunately had to go in favor of something we didn’t have to run on a dedicated VM. Fortunately, along came Visual Studio Team Services. Porting over many of our internal builds was easy but for VistaDB and Loupe we still need a dedicated VM so we can reference our licensed third party controls (From DevExpress and a few others). Overall that’s working well, although we really miss having a single dashboard for all our builds like TeamCity provided instead of the forced segregation-by-project behavior of VSTS.
The thing we really wished we had was a way of VSTS spinning up the build server when it needs it and shutting it down when it goes idle. Presently we’ve created a hack to let our team spin up the VMs, run builds, and then shut them down so we have good build performance but aren’t burning money 90% of the time when no builds are happening.
Work Item Tracking
Since the dawn of the company we’ve used FogBugz to track features, defects, and anything else that went into creating software we ship. It seems like FogCreek stopped updating the on-prem version of this years ago and we’re not keen on the cost of the managed version, so we’ve entirely replaced it with Visual Studio Team Services. So far we’re less than happy users of VSTS - at each turn it seems to be designed for massive teams with little thought to basic use cases so we may move again to a better service.
Focus on Competitive Advantage
In the final analysis we got rid of our gear because there’s no competitive advantage to owning it: No one is going to pick Loupe over New Relic or Application Insights because we’re better at administering Exchange or installing patches on our web servers. So, any minute spent working through those details is time lost that we could be using to serve you better.
To that end we’ve almost eliminated all of our cloud Virtual Machines in favor of cloud services. For example we’ve modified Loupe so we can use web apps and web jobs instead of Windows services and IIS sites, switched to Blob storage from file servers, and soon will shut down our last SQL VM when we complete the transition to Service Bus.
Curious about how it’s gone? Drop me a line and ask away!
The recently reported Cloudflare vulnerability where fragments of secure, encrypted user data could be exposed to a third party does not affect Gibraltar Software even though we use Cloudflare because we only route static content through the Cloudflare proxy for acceleration. Read more
We've updated Loupe 4 with key improvements to managing issues, a slew of performance upgrades, and our first built-in Excel export in the web UI. Read the full article for more on what's all new in 4.0.2! Read more
Use SQL Elastic Pools to lower SQL Azure costs by sharing throughput between multiple databases. Designed primarily for SaaS applications this can work anywhere you have peaks and valleys in your load. Read more