Project Spectre
In 2017, the CasperDNS service (powering CasperVend and CasperLet) has experienced an unusually large number of outages. The service has been down for a total of 27 hours downtime in 165 days, giving an up-time of around 99.39%.
What this translates to, in real terms, is frustration for us and our users. It is clear that this cannot continue. Our customers expect better, and we expect better of ourselves.
While each individual outage has been analysed and addressed, it is clear that we need to be more resilient to these kinds of failures.
What is Project Spectre?
Project Spectre is a modernisation project (effectively CasperDNS 3.0). It's a complete rework of how we handle in-world entities, and aims to provide significantly greater capacity and resiliency.
Modernising CasperVend and CasperLet is a gargantuan task. We have over 2 million in-world entities, and that number is growing all the time. Migrating the network to a new platform in a smooth and clean way for our merchants and their customers is tricky.
Project Spectre has been in planning since January, and given the high frequency of outages recently, we're going to accelerate deployment.
What's involved?
The project aims to accomplish the following:
- Invest in new NVMe hardware to replace our existing servers
- Upgrade our storage cluster. We're currently using NFS and dio file locks to share data between servers. This has proven to be non-resilient by recent outages and is a priority to resolve. We will migrate to our newly developed custom storage cluster.
- Isolate our in-world servers from the public-facing websites
- Launch of CasperLet v2 and CasperVend v3 which use the new infrastructure
- Migration of all website management functions to CasperPanel
- Implement 100% redundancy and automatic failover on all systems
Timeline
- 2017-03-01 - Development of new software began COMPLETE
- 2017-06-15 - New servers have been ordered, which will host our new inworld processing architecture. COMPLETE
- 2017-06-25 - New servers will be brought online and migration of CasperSafe to the new architecture will commence. Servers delivered 10 days late due to lack of NVMe components COMPLETE
- 2017-08-01 - CasperLet v2 will be launched into BETA, using our new architecture (Postponed until 2017-08-01 due to storage migrations)
- 2017-09-01 - CasperVend v3 will be launched into BETA, using our new architecture.
- 2017-10-01 - CasperLet v2 full release launch.
- 2017-11-01 - CasperVend v3 full release launch.
- 2018-01-01 - Old websites will be closed. End of life timeframe announcement for old versions.
Note that existing CasperLet and CasperVend scripts will continue to function, but the management will be migrated to CasperPanel.