Jump to content

You're browsing the 2004-2023 VATSIM Forums archive. All content is preserved in a read-only fashion.
For the latest forum posts, please visit https://forum.vatsim.net.

Need to find something? Use the Google search below.

VATEUD server outages and migration


Svilen Vassilev
 Share

Recommended Posts

Svilen Vassilev
Posted
Posted

Dear all,

 

you might have noticed that for the last few months we've been plagued by server downtimes pretty much every month. While those downtimes haven't been excessively long and haven't resulted in data loss, the recurrences have been undoubtedly annoying and have left us at the mercy of the service provider and their support response times. While we've been occasionally notified by our server hosting provider of server, connectivity and performance issues on their end and of their efforts to permanently solve the issues, we did not witness a lasting improvement.

 

So partly because of this, and also largely because of the growing size, complexity and diversity of the EUD web infrastructure, which currently spans across a dozen or so applications, built on different technologies and using separate deployment stacks, we've decided that the legacy server setup has run its course and it's time we migrated to a hardware foundation that would provide us with more flexibility, better reliability and improved performance to satisfy the increased uptime demands on our web applications. We have decided to move to the Digital Ocean cloud hosting platform and utilize their excellent scalable SSD based VPS-es and the benefit of regular automated full-image backups, which apart from the performance, maintainability and uptime benefits provides us with a reliable exit strategy in case of an emergency.

 

We've designed and implemented a "separation of concerns" strategy, separating different technologies and deployment stacks across several different servers to mitigate security and performance concerns and we've isolated the domain and DNS hosting from the web-server infrastructure to keep the name records and email services unaffected in the event of server outage.

 

This decision was made approximately a week ago and the migration and setup of the new infrastructure started immediately after. The entire process is scheduled to conclude on September 18th, when the old servers will be retired.

 

Our hope was that in identifying the issues and being proactive with our solutions we would avoid further outages and user frustration. Alas, that was not meant to be. Right now the legacy EUD server is down again, and once again we're at the mercy of our legacy provider to solve the issue. We're right in the middle of the migration, and it might be a small consolation, that due to the steps we've already taken, our emails and many applications, including the PTD, tasks, event websites and a couple of vACC sites we maintain are still available despite of the outage. The VATEUD forums and main VATEUD website, the API and TS server however are down. We do need the legacy infrastructure to remain up until we finish our "evacuation". Fortunately, this process will not take much longer.

 

I realize many of you probably don't care about the technical details I gave here. I thought however, that after several incidents and in the middle of another one, we owe an upfront explanation and apology to all the members that are trying to access our site and forum right now. So this is the story, as honest as it gets.

 

Please accept our apologies for this outage and for any further disruptions that might affect us until we finish the migration process on the 18th. We knew the problem, we saw it coming and we acted, however we still got caught in this cycle of misfortune. Please bear with us until we pull ourselves out of this muck. I promise we will!

 

Thank you.

C1/P2 | vaccbih.info

Link to comment
Share on other sites

Svilen Vassilev
Posted
Posted

Update:

 

The transition has been completed and all EUD web services are now running from the new servers

C1/P2 | vaccbih.info

Link to comment
Share on other sites

  • 4 months later...
Svilen Vassilev
Posted
Posted

Due to unscheduled maintenance on some Amsterdam nodes of Digital Ocean, the connectivity to EUD2 server (forums, TS, main website) is currently disrupted with increased latency and possible timeouts. EUD1 server right now is unaffected (API, PTD, tasks, event and vACC websites).

 

Hosting provider status updates can me monitored here: http://www.digitaloceanstatus.com/

 

Edit: Resolved as of 2037z

C1/P2 | vaccbih.info

Link to comment
Share on other sites

 Share