ENTERPRISE BACKUP AND RECOVRY POLICY
BACKUP AND RECOVRY POLICY
General Overview
One of the most critical functions any I.T. Organization can undertake is ensuring a structured and highly formalized data backup policy and procedures are in place. After all, an organization without its data – or the inability to retrieve and restore such data in a complete, accurate, and timely manner –faces serious issues as a viable entity. Backups are a must, especially considering today’s growing regulatory compliance mandates and the ever-increasing cyber security threats for which business face on a daily basis. Yet even without compliance mandates, a well-though out, efficient, and reliable backup and recovery plan is a must for ensuring the confidentiality, integrity, and availability of SJHAHD critical data.
Table of Content
1. Introduction
2. Overview
3. Purpose
4. Scope
5. Definitions
6. Tape Library Backup Method
7. Responsibility of (BST) on Tape, ECS Backup
8. Backup Retention Periods and Disposal Procedures
9. Backup Identification
10. Backup Types
11. Backup Storage Locations
12. User Backup Responsibilities
13. Encryption
14. Decommissioning Process
15. Backup design & diagram
1. Introductions
This Backup and Recovery Policy is intended to ensure that SJHAHD Servers are regularly backed up to prevent loss of data. All information stored in electronic form is required to be backed up to keep it save in the event of system failure, disasters, or attacks. Information required to re-create the environment and services must be backed up including not only the data but operating system software, application software, information about contacts required to acquire new equipment, organizational financial account information, and information about the business function including procedures, policies, and processes. A copy of backup and recovery procedures must be stored away from the primary site so they are not destroyed in the event of a disaster.
2. Overview
This policy covers the infrastructure and procedures that are provided for organizational data backup and recovery. It is the responsibility of the Application Owners to determine the backup schedule, recovery point objective, and retention per application. Although they may seek guidance from the Backup & Storage Team (BST), it is the responsibility of the Application Owners to manage the data retention for their application(s). This Backup and Recovery Policy is an internal IT policy which defines the backup policy for SJHAHD Servers within the organization which are expected to have their data backed up. These systems are typically servers but are not necessarily limited to servers. Servers expected to be backed up include the file server, the NAS server, and the web server, applications, databases as well as Desktop. Compliance with the stated policy and supporting procedures helps ensure the safety and security of SJHAHD I.T. system resources and all supporting assets. Backups are a critical process for any organization, especially considering today’s growing regulatory compliance mandates and the ever-increasing cyber security threats for which business face on a daily basis. Yet even for ensuring the confidentiality, integrity, and availability of SJHAHD critical data.
3. Purpose
This policy and supporting procedures are designed to provide SJHAHD with a documented and formalized Data Backup and Recovery policy that is to be adhered to and utilized throughout the organization at all times. Compliance with the stated policy and supporting procedures helps ensure the safety and security of SJHAHD I.T. system resources and all supporting assets. This Backup and Recovery Policy is designed to protect data in the organization to be sure it is not lost and can be recovered in the event of an equipment failure, intentional destruction of data, or disaster.
4. Scope
This Backup and Recovery Policy applies to all equipment and data owned and operated by the organization. This policy is effective as of the issue date and does not expire unless superseded by another policy. “This document is a reference for cases where your Epic Cache production system needs to be restored from backup. It has an emphasis on the areas requiring special attention by senior Epic and SJHAHD support staff, including both project management considerations and technical steps”. This document is available to ensure that this potentially dangerous process is completed as safely and effectively as possible. A companion Protocol 1 Checklist for Caché 2017.xls is available in on Galaxy.”
Internal system resources are those owned, operated, maintained, and controlled by SJHAHD and includes all network devices (firewalls, routers, switches, load balancer, other network devices), servers ( and the operating systems and applications that reside on them, both physical and virtual servers) and any other systems resources and supporting assets deemed in the scope.
External systems resources are those owned, operated, maintained, and controlled by any entity other than SJHAHD Infrastructure team but for which these very resources may impact the confidentiality, integrity, and availability of SJHAHD systems resources and supporting assets. Such as PACS, LAB, Pharmacy systems etc. because they have a separate infrastructure managed by them.
5. Definitions
1. Backup - The saving of files onto EMC Data Domain disk shelves, or tape offline mass storage media for the purpose of preventing loss of data in the event of equipment failure or destruction.
2. Archive - The saving of old or unused files onto Elastic Cloud Storage via Data Domain or other offline mass storage media for the purpose of releasing on-line storage room.
3. Data Recovery – The purpose of backing up data is to store a copy of the data in the event of a disaster where data is lost or corrupt. Data recovery is the act of restoring data from the backup in order to restore data to the desired point in time.
4. Recovery Point Objective (RPO) – is the maximum targeted period in which data might be lost from an IT service. The RPO is the age of files that must be recovered from backup storage for normal operations to resume. The RPO is expressed backward in time (that is, into the past) from the instant at which the failure occurs (e.g. a high transactional DB data is only good for 5 days. The RPO is 5 days ago or sooner).
5.1 Policy Definitions
Non-Production Data and Security Policy
The following information outlines the policies for non-production data and Security policy.
The storage and backup team will be responsible for all aspects of backing up servers supported by Johns Hopkins Aramco Healthcare. Test servers and Generic Operating Systems will not be backed up, unless requested by the server owner and approved by computing manager. Such backups include daily incremental, weekly, and full monthly backups as defined by service or application owner. This team will also be responsible for finding and restoring data when requested or required for Disaster Recovery purpose. SJHAHD (BST) is to ensure that the data backed up and recovery policy adheres to the following conditions for purpose of complying with the mandated organizational security requirements set forth and approved by management: ITS - 09, CP-IT-113 and CP-IT-103
5.2 Backup Environments
A critical component of any data backup and recovery initiatives is to properly identify all environments – and the associated data – that required backup procedures. While critical environments, such as those relating to production, development, and staging or quality require backups, it’s the platform and the supporting systems within these environments that are to be identified, with applicable backup procedures in place.
This would include and not limited to the following and supporting systems:
Network devices backup, such as configuration files, rulesets, and other critical data
Critical servers, such as all production facing service, DNS servers, email servers, FTP servers and all other systems associated with such servers.
Servers, (both virtual and physical stand-alone) such as all operating systems, and associated applications (i.e., databases, web server applications, etc.) for all Microsoft Windows, UNIX, Linux, and any other type of other operating systems.
5.3 Backup Utilities and Supporting Tools
All backup processes undertaken by SJHAHD are to utilize approved hardware, software, and other supporting tools for ensuring the confidentiality, integrity, and availability of SJHAHD entire backup platform. Backup utilities are to consist of, but are not limited to the following:
Backup software: Networker and Avamar
Backup hardware: Data Domain DD4200, DD2500, DD6800, Elastic Cloud Storage
As for the backup processes performed, the following are considered acceptable by SJHAHD when conducting backup of all necessary data:
5. 4 Backup Exceptions
Any exceptions to the type of backups and the default backup scheduling are to be approved by authorized computing manager and the (BST) team lead, with a valid and justified reason. Additionally, such exceptions – which are ultimately changes to the backup process – are to be submitted with a formal changes request, reviewed and approved by the change management committee at SJHAHD. Furthermore, changes to any of the tools utilities used for the backup process also required the use of a documented change request, initiated by selected personal among others will be backup team.
5. 5 Backup Reporting Metrics
Backup reporting activities, for all types of backup (i.e., Full, Differential, Incremental + Synthetic Full Every Friday) are to be monitored on a daily basis for ensuring the success of the backup process itself. Specially, all backups conducted are to generate reporting metrics for which authorized personal are to review in a timely manner. Such reporting metrics includes, but not limited to the following:
Email confirming the current status and final results – such as success of failure – of the backup.
Reports generated confirming the current status and final results – such as success failure – of the backup
Portals for which authorized employees can log into for reviewing and confirmation the current status and final results – such as success or failure – of the backup.
Backups that are successful are to be recorded as such, yet backup failures an exceptions are to be handled immediately, with all appropriate steps undertaken for ensuring the timely backup of such data. Failures and exceptions are delivered via email reports or metrics from the backup utilities notifying authorized employees of such issues. Depending on the nature, severity, and urgency of the backup itself and the resolution for correcting the issue, a thorough and analysis is to be undertaken for correcting the issue in a timely manner and for helping mitigate the issue in the future.
5. 6 Backup Storage and Security
Appropriate security measures are to be implemented for backups, which includes all necessary physical security controls, such as those related to the safety and security of the actual backup media – specifically – disks, tapes, and any other medium containing backup data. This requires the use of a computer room or other designated area facility that is secured and monitored at all times whereby only authorized SJHAHD backup team have physical access to the backup. Thus, secured and monitored implies that the facility has in place the following physical security and environmental security controls.
Constructed in a manner allowing for adequate protection of backups
Security alarms that are active during non-business hours, with alarm
Appropriate power protection devices for ensuring a continued, balance load of power to the facility for where the backups reside.
Appropriate fire detection and suppression elements, along with fire extinguishers placed in mission critical areas.
Adequate closed-circuit monitoring, video surveillance as needed, both internally and externally, with all video kept for a minimum {X} days for purpose of meeting security best practices and various regulatory requirements.
5.7 Backups Schedule
Every month a monthly full backup on all backup groups shall be made in order for us to have a latest full backup of user’s data.
5. 8 Timing
Full backups are performed nightly on Sunday, Monday, Tuesday, Wednesday, Thursday, Friday and Saturday. If for maintenance reasons, backups will not be taken on Friday, they shall be done on Saturday or Sunday. Data Center Operations, Backup and Storage Group is responsible for ensuring the backups are performed as scheduled. Data Center Operations, Backup and Storage Group delegate’s specific system administrators to perform specific backups and those administrators are responsible for carrying out that function but the IT managers must ensure that the administrators perform and check backups in a timely manner.
5. 9 Age of Backup
The date each backup was taken shall be recorded in the backup index and it will be moved to Elastic Cloud Storage “ECS” after been used longer than six months shall be discarded and replaced with new data
5.10 Saveset to Storage
There shall be a separate or set of saveset groups for each backup daily including Sunday, Monday, Tuesday, Wednesday, Thursday, Friday and Saturday. There shall be a separate or set of saveset groups for each Friday of the month such as Friday1, Friday2, etc. Backups performed on Friday or weekends shall be kept for one month and may be used again the next month on the applicable Friday. A monthly backup of all data should be kept at least one year. Backups performed Monday through Thursday shall be kept for one week and used again the following appropriate day of the week. The brows policy of the data shall be kept for 2 months and a new one is taken. The retention for all saveset shall be 1 year before recycled and new data is taken.
5.11 Testing Backup Integrity
The ability to restore data from backups shall be tested at least once per month and quarterly for databases and other application to ensure that the data is valid. SJHAHD (BST) will schedule testing of backups to ensure viability twice a year.
5. 12 Restoration
Users that need files restored must submit a request through ServiceNow. Include information about the file creation date, the name of the file, the last time it was changed, and the date and time it was deleted or destroyed.
Restores that require a tape from off-site storage will be started within 48 hours
All other restores will be started within 2 hours business hours
Restores over weekends/holidays will be performed the following business day, unless an urgent/high ticket is submitted
6. Tape Library Backup Method
Tape Library is a storage device that contains one or more tape drives its uses slots to hold tape cartridges and a barcode reader to identify tape cartridges and an automated method for loading tapes. SJHAHD will utilize the tape library method to backup critical business data to tape devices and offload those data to an offsite location. For tape library support model see below:
6.1 Transporting of Media
Transporting tape cartridges is vital for ensuring its safety and security at all times during movement. The following best practice are to be adhered to at all times, when applicable:
Cartridges is to be properly packed and stored for ensuring its safety during movement, which means using approved cases and other protective devices.
Cartridges is to be kept away from extreme temperatures, both heart and cold, during movement.
Cartridges is never to be left alone or unsupervised during transportation.
Only approved SJHAHD backup team transport methods and vehicles are to be utilized.
Transport is to be in a direct manner as possible, with no unnecessary stops or deviation from the intended route.
When necessary, transportation of media is to be also include additional security precaution as required.
6. 2 Tape Library Backup Requests and Retrieval
Backup are to be available in a timely manner for any such requests for restoration. Such requests require written approval by authorized SJHAHD backup team members detailing the request, along with all applicable information as necessary. A change request is to be open for such requests, and approved by change request management team. As for the restore process, it is to be conducted by SJHAHD backup team who have the technical knowledge to restore and test to ensure a complete restoration and the integrity of the data that was achieved, along with conducing any user-acceptance and system testing. Lastly, the restore cartridges is to be promptly returned to the physically secured area for safe storage.
6.3 Tape Drive Cartridges
Tape cartridges shall be replaced weekly and the cartridges shall be taken to safe location monthly. SJHAHD need to decide on the best location to store the tape drives if tape drives will be adopted as long term retention policy.
6. 4 Media Management and Quality Control
All backup media is to be clearly labeled, logged accordingly, and rotated as necessary for ensuring all retention periods are adhered to, while also utilizing existing mediums (i.e., tapes, disks, etc.) for writing over and copying as necessary for future backups. Additionally, media management practices for backup also requires that strict policies be in place for transporting media to and from the off-site approved facility being used by SJHAHD. As such, an authorized list is to be kept that includes only selected SJHAHD backup team allow to transport and recall media, with no exceptions.
Either in manual form or electronic format, the following information is to be recorded regarding backups:
Name and unique identifying number of backup medium
Contents of the backup
Data classification of backup
Location of where it is being stored
Origination of backup – where the medium initially came from
If backup are being transported, the following is to be recorded
Purpose
Name of individual requesting backup
Intended destination
Date of release
Date of release
Date of return
Any other information deemed relevant
As for quality control initiatives, backups are to be used until they reach a point far before in which the quality of the data may come into questions, ultimately to avoid media failures. At any time, if the quality of media becomes an issue, the data is to be immediately removed to another medium, with the compromised medium being disposed in accordance with company policy.
6. 5 Procedures regarding Target Media (e.g. Tape, Disk, and Cloud)
(BST) team is responsible for maintenance and support
Full backups will be stored on- and off-site for 3 months
Tapes may be reused as they expire, if they are still viable
b. Weekly Production Backups i.e. Full and incremental weekly backups will be stored on Cloud Tier for 1 month then offload to ECS
Full backups on critical data such as Epic related will be moved to tape devices and off-site every 2 weeks
The data copied to tape will be stored for life with a 9 years recycle period of the cartridges to allow for decommission of the old tape
Monthly Production Backups i.e Full and incremental backups will be stored on-site for 3 months
Full backups will be stored on-site using ECS for 5 years to allow for hardware refresh
Monthly Vault Full backups to tape will be stored off-site for 3 months and on-site for the remainder of the year
Non-production i.e. Non-production environments will be retained for 1 week and will not be sent off-site to tape except management approval
Archive logs, incremental kept for as long as the user needs to allow for database restore and the data will be keep on-site using ECS
On-demand backups. Retention specified by the person requesting the backup will be stored as per user request
7. Responsibility of SJHAHD (BST) on Tape Library and ECS Cloud Backup
The Data Center Operations, Backup and Storage Group Lead shall delegate a member of the group to perform regular backups. The delegated person shall develop a procedure for performing backups, testing backups and test the ability to restore data from backups on a monthly basis. At the moment we have backup request form which shall be completed before any new backup request is made and if restore is required.
The following outlines Backup Software support – including policy configuration, restores, backups:
It is the responsibility of the SJHAHD - (BST) Team to make sure backups are running as scheduled
The SJHAHD - (BST) Team will verify that backup jobs have completed successfully, and will contact customers if problems occur
Customers will open a service ticket on ServiceNow and assign to SJHAHD - (BST) Team when server problems occur and (BST) team support is required
When a new server is added to the production environment, the administrator of the server will contact the SJHAHD - (BST) Team to have the server added to the backup system via ServiceNow Work Request
The customer and SJHAHD - (BST) Team will work together to find resolutions when problems occur
o It is the responsibility of the SJHAHD - (BST) team to install updates/upgrades of the backup software
o The SJHAHD - (BST) Team will report any problems with the backup software to the customers. They will include: i.e. Troubleshooting steps taken; and if any errors is found SJHAHD - (BST) Team is responsible to contact the vendor when necessary for troubleshooting. Troubleshooting these problems requires in-depth knowledge of operating systems and may require system reboot.
The SJHAHD - (BST) Team will be responsible for the following: a. Ordering cartridges media, cleaning tapes, and labels
Checking backup reports to ensure that they were completed without errors
Making sure the library has tape media available for backups and offsite storage
Packing monthly tapes and sending them to the vault in alignment with the end user engineer
Updating clients/servers to current version after upgrades when feasible with assistance from the customer if necessary
Managing relationships with storage vendors
Maintaining storage arrays
8. Backup Retention Periods and Disposal Procedures
Backup retention periods – regarding backups – are those specifically identified for purposes of restore and recovery of SJHAHD data. Thus, it is the responsibility of SJHAHD Backup and Storage team to ensure the applicable backup retention periods meet all necessary needs of the organization, while also promoting best practices. Conversely, retention periods, such as those defined by contractual, legal and regulatory compliance mandates, which outlines policies and procedures regarding data retention length and disposal of the actual data itself.
Additionally, please note that when referring to disposal procedures in the context of backups, this specifically applies to the physical devices used for storing such data, and not the actual data itself. Policies regarding disposal of data – the actual information – are also outlined in SJHAHD Data Retention and Disposal Policy. Thus, for purpose of disposal for the actual physical devices used for storing such data, they consist of the following:
Disintegration
Shredding (disk grinding device)
Incineration by a licensed incinerator
Pulverization
Please note that prior to physically destroying any of the actual devices used for storing data, all data must be electronically removed (i.e., wiped, formatted, etc.) as the primary layer of security before being destroyed.
9. Backup Identification
1. Data Center Operations, Backup and Storage Group is responsible for identifying all systems, vendor supplied programs including operating systems and application programs, IT policies, IT procedures, contact information for vendors and business partners and any other relevant information needed to rebuild the IT department from scratch in the event of a disaster. The business owners are responsible for identifying similar items required to rebuild their business function in the event of a disaster. The IT management working with the business owners must identify specific items relating to the business that must be backed up regularly and the frequency of backup. Any backups done on a slower schedule than documented in this policy must be agreed to in writing by the business owner and IT management.
2. Data Center Operations, Backup and Storage Group is responsible for creating procedures for transferring the identified items required for business rebuild offsite and ensuring they are transferred by delegated staff. Data Center Operations, Backup and Storage Group must be sure procedures exist and are kept both onsite and offsite for the purpose of both file recovery and disaster recovery.
9.1 Data Backed Up
Data to be backed up include the following information:
1. User data stored on the hard drive.
2. System state data
3. The registry
4. Application software
5. Database files
Systems to be backed up include but are not limited to:
1. File server
2. Production web servers
3. Application servers
4. Production database servers
5. Domain controllers
6. Test database servers
7. Test web server
8. All applications
10. Backup Types
The following types of backup are to being used at SJHAHD backup process.
Full – A full backup is simply a complete backup of all data. It’s the most comprehensive and time-consuming type of data, yet it ensure a complete backup of everything has been undertaken.
Differential – A differential provides a backup of files that have effectively changed since the last full backup was performed. A differential backup typically saves only the files that are different or new since the actual last full backup, but this can vary in different backup platform.
Incremental – An incremental backup is essentially a backup of all the files, or parts of files that have changed since the previous backup was conducted, regardless of the type of backup (Full, differential or incremental)
Additionally, backup activities for full, differential, and incremental are to take place on an as-needed basis, such as in the following manner
Full: At a minimum, once a week
Differential: At a minimum, daily
Incremental: As necessary
Image Level Backup: This are backup done directly from vCenter it takes the entire image of the Operating System and backed it’s up to Data Domain. This will enable us restore the entire operating system directly from Data Domain in an event of failure.
11. Backup Storage Locations
The SJHAHD management must determine and specify the location of the backed data for recovery in the event of a system loss or loss or a room and also in the event of a disaster in Aramco Corporate Data Center where it’s currently located. The storage locations must be physically secure enough to keep the backup data considering the level of sensitivity stored on the Data Domain. The storage locations must have sufficient environmental controls to keep the backup data from degrading. This policy may contain descriptions about how various systems and types of systems are backed up such as Windows or UNIX systems.
12. User Backup Responsibilities
Users are responsible for either storing their data on a networked file server rather than their local workstation or they must make arrangements for backing up their workstation or back it up on a regular basis. The frequency of backup and whether the data can be stored on a workstation depends on the business criticality of the data for preserving the business function. Storage of critical or sensitive data on a location other than a networked server must be approved by management.
12. 1 Other Backup Responsibilities
Management must decide how long they would like to keep image level backup that are taken from vCenter.
The image should be stored offsite. Hardware requirements and specifications for all servers should be saved and stored offsite.
Backup inventory must be tracked using implemented procedures. The locations of backup media must be known by delegated and authorized IT staff.
Equipment used for restoration must be compatible with the backup media. This means that in the event of a disaster, backup reading equipment that is available must be capable of reading the backup media.
All data and information required to rebuild the business including source code for developed programs, procedures, policies, software, system documentation, program documentation, network documentation, and contact information must be stored offsite.
12.2. Out of Scope
Troubleshooting and investigating what caused data loss or data corruption on client servers
Lost production data as a result of end-user changes to an application or application data
Lost data that falls outside of the backup windows outlined in this policy
Rebooting or restarting clients/servers
Making sure clients/servers have been entered into DNS properly
13. Encryption
Data Domain Encryption provides inline encryption, which means as data is being ingested, the stream is deduplicated, compressed, and encrypted using an encryption key before being written to the RAID group. Data Domain Encryption software uses RSA BSAFE libraries, which are FIPS 140-2 validate and cryptographic key. Encryption of data in flight encrypts data being transferred via DD Replicator software between two Data Domain systems. It uses OpenSSL AES 256-bit encryption to encapsulate the replicated data over the wire. The encryption encapsulation layer is immediately removed as soon as it lands on the destination Data Domain system. Data within the payload can also be encrypted via Data Domain encryption software. NFSv3 and NFSv4 support krb5i and krb5p for integrity and privacy, respectively. The client can use TLS to encrypt the session between the client and the Data Domain system.
14. Decommissioning Process
Once servers are decommission by the application owner they should be a ticket assign to the backup and storage group to start the decommissioning process. The decommissioning process will involves taking a new backup with 3 months retention period and removing the server from backup group, uninstalling networker agent from the server and closing the ticket.
Comments
Post a Comment