Florida State University

Executive summary: 

FSU was looking for an AWS partner that could reimagine and rethink its legacy Disaster Recovery solution that was hosted in a physical data center. Enquizit was selected based on its strong cloud experience and stellar company reputation. Enquizit worked closely with FSU’s BCDR Program Director and FSU-ITS (Central IT) to implement CloudEndure DR, SkyMap™, and other AWS orchestration and automation tools to deliver an AWS cloud-based DR solution that met FSU’s DR objectives.

Business Challenge / Problem Detail:

In the wake of Hurricane Michael, FSU leadership called for a refresh of its existing Disaster Recovery (DR) solution. Hurricane Michael was a Category 5 storm that landed just west of Tallahassee, Florida, in October 2018, devastating the surrounding areas. FSU understood at the time the University required a high-confidence DR solution to support an actual disaster declaration with long-term continuity of operations. As a result, FSU leadership established guidelines for its new DR solution to meet or exceed a sub 1 hour Recovery Time Objective (RPO) and 8 hour Recovery Time Objective (RTO).

 

DR

Shortly after Hurricane Michael, FSU began to plan the transition of its traditional Business Continuity and Disaster Recovery (BCDR) approach to a more modern approach using AWS as its DR platform. The new BCDR approach was also going to help FSU accelerate its public cloud adoption efforts by implementing a sound AWS foundation in support of future cloud-based use cases at the University.

In early 2019, FSU conducted a competitive solicitation that sought to bring the vendor community forward with solutions closely aligned to mission objectives. Of the many technology partners and proposed vendor solutions reviewed by FSU, one company and its approach to solving the problem stood out as an overall excellent fit – Enquizit. FSU and Enquizit began working on the DR project in August of 2019 with a planned completion date in May of 2020. 

Hurricane Michael is just one example of the ongoing challenges Gulf states face with recurring natural storms. Hurricane Harvey in 2017 and Hurricane Katrina in 2005 are potent reminders of type devastation these storms can cause businesses and communities in the Gulf of Mexico region.

Even before Hurricane Michael in 2018, such as the Category 1 storm Hurricane Hermine that hit FSU directly in 2016, FSU recognized its legacy DR site was incapable of meeting the organization’s scope and needs. Therefore FSU began to make plans for a next-generation disaster recovery solution – Hurricane Michael just advanced the timeline for project initiation. Also in 2018, FSU completed a business impact analysis exercise, so a prioritized set of application business systems including requirements for RPO and RTOs, was well understood. The challenge became to survey the marketplace for solutions delivering the highest value in meeting FSU’s BCDR requirements.DR2

In early 2019, FSU conducted a competitive solicitation that sought to bring the vendor community forward with solutions closely aligned to mission objectives. Of the many technology partners and proposed vendor solutions reviewed by FSU, one company and its approach to solving the problem stood out as an overall excellent fit – Enquizit. FSU and Enquizit began working on the DR project in August of 2019 with a planned completion date in May of 2020.

Output / Our Solution:

Enquizit listened to the needs of FSU, presented design options, and delivered a very capable and cost-effective DR solution. IT disaster recovery was transitioned from the physical colocation hosting site in Atlanta to the Amazon Web Services (AWS) platform. 

 

As the many specific design elements of the DR project were being rationalized, what became evident is that the use of a Pilot Light scenario was going to be essential in meeting the aggressive RTO and RPO requirements of FSU. To be specific, replicating file and database backups to the cloud then instantiating services from those backups was not an option – continuous data replication was determined to be the approach to data protection for FSU’s DR solution.

 

 

Enquizit knew CEDR was going to play a significant role in the overall solution design (in terms of % hosts). However, it was not clear if CEDR was a complete solution to handle all replication patterns required by FSU –  including ~20TB of NAS files, and a large Oracle database host platform running on Exadata with RAC. Understanding the requirements for all FSU replication patterns was vital to develop a DR solution with a fresh perspective and in a manner that was innovative, practical, cost-effective, and manageable.

 

Critical design elements that surfaced during the project were:

 

Preserve as much of the production systems state as possible such as namespace, host interfaces and the associated logical addressing

Preserve the critical dependent technology standards employed in the production environment and they were F5 LTM, Palo Alto FW and identity (directory) services

Bring Your Own IP (BYOIP) for use in establishing a 'permanent presence' in AWS which facilitates a more streamlined integration with FSU's many third-party services

Leverage the concept of Pilot Light for the purpose of supporting the replication needs that extended beyond those systems targeted for AWS's CloudEndure DR

Use AWS's Control Tower to establish the necessary guardrails in the name of operational and security/compliance governance

Use Oracle Data Guard for the asynchronous replication of the databases

Host Oracle PeopleSoft application on EC2 to ensure maximum flexibility to accommodate the needs of a very demanding application, future and present

Reduce cost

Ability to fine-tune performance

Ability to use native Oracle features like Data Guard replication and remote listener functionality

CloudFormation for its robust orchestration and configuration management capabilities (for example managing configuration drift through the exclusive use of stack sets)

Direct Connect (DX) private line service (4X 1Gbps) for reliable and low-latency network transport in conjunction with Direct Connect Gateway and Transit Hub.

All volumes are encrypted using AWS KMS keys

The DR solution delivered to FSU is representative of the following:

 

Multiple replication patterns merged through automation scripts in support of both DR Testing and DR Ready

200+ and TB of total data replication footprint across all three replication patterns (file, database, CEDR)

A simplified DR runbook made possible by the use of (highly-reusable and parametrized) automation scripting

Client was able to achieve first-ever successful DR test; near-zero RPO and sub 4-hour RTO

Tooling (Tools and Technologies): 

AWS: Control Tower, DirectConnect, Bring-Your-Own-IP, CloudEndure DR, Lambda Functions, Step Functions, CloudFormation, EC2, EBS, S3. Oracle Data Guard, RISC Networks, N2WS, Buurst SoftNAS, and Enquizit SkyMap™

Client:

Florida State University

Industry:

Higher Education, Public Sector

Prime & Partners:

AWS & Enquizit 

Core Partners:

AWS (including recently acquired CloudEndure-DR), Enquizit SkyMap™, Flexera Risc Networks, Palo Alto, F5 Networks, Buurst SoftNAS, Oracle PeopleSoft 

Goals and Benefits:

A comprehensive DR solution which accounts for multiple replication patterns (database, file, servers) that minimizes the customer’s steady-state runtime cost and greatly reduced DR Runbook while having the ability to consistently deliver to aggressive RPO and RTO targets.