A) 24 X 7 Monitoring The 24 x 7 Monitoring service is suitable for
companies that have in-house resources to respond to system events when it
happens, but does not have the resources to actively monitor systems 24 x 7 and
call appropriate resources or initiate actions in the event of an alert, outage
or system failure.
With the 24 x 7 Monitoring service, Managed Uptime monitors various aspects
of the network (e.g. Web pages, Exchange functionality, database server health
and responsiveness, etc.). We set up a number of alerts based on your
requirements.
In the event of a system alert (e.g. loss of connectivity, device status change,
application failure, etc.), Managed Uptime initiates a call to you, the client,
following the pre-set escalation path laid out by you, and inform the
appropriate system engineer of the event and its severity.
Monitoring
The 24 x 7 monitoring of client’s servers and network devices is done from
different monitoring stations to ensure service redundancy, Up Monitoring of
client’s servers and/or network devices includes:
- Outage Monitoring for server failures and loss of network connectivity.
-
Threshold Monitoring for violation of hardware performance thresholds .
-
Process Monitoring for the active status of key processes on each server .
-
Application Monitoring for application failures through end to end testing .
-
Proactive system monitoring to identify potential problems and alert the
appropriate resources to prevent major outages.
- (Optional Service)
Backup Process Monitoring for uncompleted backup processes through the backup
transaction logs.
All applicable services are continuously monitored. All alerts are noted and if
applicable escalated to the named client representative as defined in the
escalation procedure.
Performance Reporting
Managed Uptime provides clients with a reporting website for the various aspects
of the network being monitored. This allows clients’ IT staff to easily access
real time statistics on usage and system threshold, as well as have historical
data for trending over time.
B) 24 x 7 Response
As IT professionals are painfully aware, many system problems can be resolved
expeditiously if addressed early and the proper response/solution is implemented
without delay. However, too
often such incidents occur after hours when all support staff are away. Managed
Uptime addresses this issue with the 24 x 7 Response service.
The 24 x 7 Response services supplement the 24 x 7 Monitoring of Client’s
systems and network as detailed in the previous section. With 24 x 7 Response
services, in the event of a system alert but before a call is made to the
client, Managed Uptime takes initial steps to assess the situation; opening a
log file on the machine, logging into the server in question, investigating the
event and responding appropriately. Once all documented response actions have
been tried and the system alerts continue, then, the escalation path outlined is
implemented. Of course all actions are log on file for client’s review.
System Response Manual
Working in conjunction with the client’s IT staff, a System Response Manual is
developed. This manual includes a list of all servers to be monitored by Managed
Uptime, their location and software platform. The Manual also lists for each
server the hardware profile along with the most common problems experienced by a
each server, and the correct response to these problems. Upon receiving an
alert, Managed Uptime determines the systems and dependent systems affected,
review the System Response Manual, log into the server in question, and perform
the agreed pre-determined maintenance as required. These actions could range
from stopping and starting a service or services, to deleting or moving specific
files to alleviate space constraints. Typical responses to system alerts
include:
- Stop and start services on Unix or NT boxes.
- Review Event Viewer for
problem identification.
- Move or delete files.
- Perform series of
pre-arranged actions in order to bring specific service back online.
- Review
dependant servers to determine if the problem is occurring on a secondary
system.
- Terminate (‘kill’) deadlocked processes
If the event is resolved at this stage, the event is logged in the daily report
log and also forwarded to the appropriate client representative(s). This log
includes the time of the alert, the affected systems, the net result of any
downtime that resulted and a list of actions taken to resolve the problem. If
Managed Uptime is unable to resolve the problem following the procedures in the
System Response Manual, it escalates directly to the appropriate client system
engineer. Upon contact with the engineer, all preliminary data on the problem
are conveyed, and further action taken at the direction of the engineer.
C) Advanced Consulting Services
Managed Uptime also provides Advanced Consulting Services for clients who
require advanced engineering help with their systems, including engineering
services for networking, Unix or NT systems. These services can be contracted on
an hourly, weekly or monthly basis. Managed Uptime senior engineers are all
qualified experts in their specific disciplines and perform advanced network
architectural planning, implementation and trouble shooting of any network
component.