Downtime, Automation Breakage, and VPS Link
As it turns out, some of my apps and services have been experiencing a lot of issues (or, in some cases, they haven’t been working at all). I had no idea about this until I got a couple emails from users complaining that things are broken. Here’s what was affected:
- External content downloads for iOS apps. This includes audio and video in various apps, including Veest and Quran.
- On-launch announcements for iOS apps.
- Map views for the JK Locations site.
- Landing pages for Google Apps site.
- Help and support landing pages for iOS apps. I’m really sorry about this one; this ended up preventing people from letting me know of the breakage.
I use a service called VPS Link. They’re a hosting provider that rents out virtual private servers, and they’re relatively popular. Here’s what went wrong:
- I set up CRON jobs to take care of server updates, and rsync backups.
- I set up monitoring through VPS Link, which notifies me when my services go up and down. This is done with just a ping and port monitoring on their end.
- Since updates were automated, and I’d get an alert email if anything went down, I tended to only check the server when I received one of these alerts.
- Something went wrong. I’m not sure exactly what, but this caused my server to start uploading a bunch of data. The log line I got from VPS Link said it was 2.81 GB, but I’m not sure over how long this was (hours, minutes?). Might have been hacked. Might even have been a backup job gone wrong. I’m still trying to look into this.
- VPS Link decided to shut down my server “temporarily” as it was adversely affecting their network. They did this by de-provisioning my node.
- When a node is not provisioned, the monitoring service doesn’t bother monitoring it. This means that I don’t get any service alerts at all. Which in turn means that I won’t know about this unless someone from VPS Link gets in touch with me.
- No one from VPS Link got in touch with me. No emails from customer service. Nothing.
- Because I don’t bother to check the system unless something goes wrong, and because I didn’t know that something had gone wrong, the server was offline for a long, long time.
Why it Won’t Happen Again
I’ve learned a lot from this. First and foremost is about trust. I was too trusting with my automation, too trusting with third party monitoring, and too trusting with VPS Link’s customer service. The old adage applies; “trust, but verify”. Constantly verify.
The steps I’m taking:
- Move the support landing page to a different, more stable hosting provider. Something like Google Apps, or even just a Facebook page.
- Automation is good, but sanity checks are better. Setting up a calendar entry to periodically remind me to check up on everything would have helped me get things sorted out faster.
- Redundancy. If not for the main server, then at least for the monitoring. I shouldn’t have trusted just VPS Link’s monitoring service. In addition to theirs, I should have also used a third party system and rolled my own simple ping job.
- Pay more attention to my services.
Gruber’s Take on Google’s Openness
At Google, a company that prides itself on openness, some buildings were on “lockdown” to ensure that upcoming products don’t leak. … Bilton: “Google has started to realize that they have to protect upcoming products and adopting secrecy has become necessary within the organization.”
Rosenberg: “Open will win. It will win on the Internet and will then cascade across many walks of life: The future of government is transparency. The future of commerce is information symmetry. The future of culture is freedom. The future of science and medicine is collaboration. The future of entertainment is participation. Each of these futures depends on an open Internet.”
According to John Gruber, Google cannot be for an open internet and transparent government if they’re not open with their intellectual property. The comparison hints that Google’s lockdown on upcoming products goes counter to their openness of data, freedom of choice and collaboration, etc. This is tantamount to saying that “Google’s cafés are closed to the public, therefore Google is going against it’s openness principles” or even “I can’t go talk to Google’s CEO whenever I want, so Google is not open.”
You cannot compare secrecy of an upcoming product to Google’s attitude towards openness and freedom. Such comparisons are just as inconsequential as comparing Apple’s beautiful products to its messy organizational structure. Who cares?
Such a viewpoint would deem that an open Google is one that forces each team to broadcast to the public what they’re working on, allows free public access to Google’s buildings, and gives away all of Google’s data. Gruber’s comparison makes absolutely no sense.
Warren Buffet once said that the best businesses were economic castles protected by unbreachable moats. Now, Erick Schonfeld writes that if search is Google’s economic castle, Android is a moat, Chrome browser is a moat, and Google Apps is a moat — all free products, subsidized by search profits, intended to protect the economic castle that is search.
‘Android, as well as Chrome and Chrome OS for that matter, are not “products” in the classic business sense. They have no plan to become their own “economic castles,”’ says Benchmark Capital VC Bill Gurley. ‘They are not trying to make a profit on Android or Chrome. They want to take any layer that lives between themselves and the consumer and make it free (or even less than free).’
So don’t measure the success of Google’s new businesses by how much revenue or profit they generate directly but measure it by how much they shore up Google’s core search business. ‘Google is … scorching the earth for 250 miles around the outside of the castle to ensure no one can approach it. And best I can tell, they are doing a damn good job of it.’ Hugh Pickens