Beanstalk is migrating to colocation
Since Beanstalk started, we’ve been through a long evolution of changes in our server infrastructure. The very first Beanstalk deployment was on a Media Temple grid server. With limited resources, that obviously did not last long. We then moved to Softlayer, where we had a few physical servers that ran our infrastructure. Back then, we did not really have the experience to manage servers, security and performance on our own. Shortly after, a nice opportunity came up to host with Engine Yard as a partner. This was a perfect fit, because they managed the infrastructure and left the application to us. They helped us grow during our first year and provided amazing support. As we grew however, we realized that in order to maintain a reliable and fast service, we needed to be accountable for our infrastructure.
Growing a product requires an understanding of the entire stack to really get what you want from it, and that is just what we did when we moved to Rackspace. It allowed us to choose our hardware, isolate services and troubleshoot on our own to understand our capacity and growth requirements. We’ve been happily hosting with Rackspace since July of 2009.
In the last month however, we’ve made a big decision to move Beanstalk to colocation, a data center where we purchase the hardware and manage just about everything. I’m going to cover some of the reasons why we made this decision, along with some information on the hardware and provider we chose.
The main motivation for colocation was the freedom to choose the hardware and network that best fit our needs. As we continued to grow at Rackspace, things like limited storage bays, SSDs, and 10Gbit Ethernet were difficult and costly. At Beanstalk, we are mostly I/O bound. Not only on our file servers, but also with our deployment and database servers. We hit a barrier a few months ago when we needed to expand our storage and did not have any good options. Rackspace tried very hard to make it work, but custom offerings and specific servers are not their core offering. Our next obvious choice was to look into colocation.
This was not an easy decision. Until now, we’ve never had to worry about hardware. Rackspace just installed it and we setup our OS and application. Deciding to move to colo meant we’d be responsible for building the servers, testing them, replacing parts and putting up the cash to afford it all. A year ago I would have never done this, but since we hired Russ (our sys admin) in December, it made the decision a lot easier. He already has a lot of experience with colocation and hardware and helped avoid a lot of mistakes along the way.
Choosing a colocation provider
As it turned out, deciding to move to colo was the easy part. Actually selecting a colocation provider was the hard part. A lot of questions come up.
- Do we host the servers in Philadelphia where we are located or somewhere else?
- How much bandwidth do we need?
- What networks, locations and providers have the lowest latency?
- What happens when hardware breaks and we need to replace it?
- How is security handled?
- What is the best way to migrate 20TB of customer data?
After a lot of research, we landed on Server Central in Chicago. I initially heard of them through the 37Signals blog, when they moved to colo from Rackspace. Russ and I did a tour of the Dupont Fabros facility where Server Central operates more than 40,000 square feet of raised-floor space and 5.5 megawatts of power. The data center is beyond impressive. We decided on this location and company for the following reasons:
- The connection goes through 350 E Cermak, the worlds largest data center and the most fiber paths connecting the US.
- They manage their own IP Transit, nLayer.
- Access to cross connect other providers easily if needed.
- A very professional and responsive team to help us with issues or “remote hands” when we need anything.
- If ever needed, they also offer managed services, hardware and cloud services.
- And equally as important, we met some of the core team and they’re an awesome group of people.
I plan to do a follow up post on the exact hardware we purchased, some of the problems we ran into, as well as the cost considerations (and savings!) compared to leased or managed hosting. For now, some highlights:
- We chose Super Micro equipment across the board.
- We use SSD storage in every server. 24TB of total raw disk space across 60 disks.
- The new ZFS file servers contain over 48TB of redundant storage.
- Running on mostly E5 (sandy-bridge-EP) and E3-V2 (ivy bridge) processors.
- 10Gbit ethernet dedicated to file server traffic.
What you can expect
Russ and I just returned from our second trip to the data center. This time it was all business - installing the servers and network for the new environment. Almost everything is installed and ready. Over the next couple of weeks we will test the environment and make sure it works as expected. We have a very detailed migration plan to reduce the downtime and make it a smooth process for our customers. We’ll be updating everyone as we go with new server IPs for firewalls and any maintenance notices.
Our plan is to start getting test accounts moved to the new environment in a couple of weeks, then schedule a larger migration shortly after. Our internal deadline for the migration is scheduled for the end of the month.
Performance and reliability
In our initial file server performance tests, we are seeing more than double the speed on our Subversion benchmarks in a single thread. With multiple threads it will be even better. In addition to that, ALL services will run on bare metal hardware instead of VMs aside from a few smaller machines. Introducing 10 gigabit networking has done wonders as well.
While our current environment has redundancy in many places, this new environment takes it a bit further with multiple firewalls, load balancers and switches, allowing us to keep running in the event of failure.
And finally, but equally as important, we now have more control than ever over our IP transit. Serving customers around the world with the lowest latency possible is a huge challenge, but having the flexibility to change routes and add providers as needed allows us to serve our customers better. Being centrally located in the US (Chicago) will also benefit many of our West Coast customers as well.
We’re looking for a group of customers who would like to be migrated to the new environment once it is tested. You can expect everything to be working normally, but we may need to tune some things as we go. Having customers located around the world who can help us stress the new environment will get us there much faster.
If you are interested, please contact support and include your account url along with the location (City / Country). Don’t worry, we can always migrate you back if needed.