Network Management

Enterprise  network management is the task of ensuring that the networks and systems provide the required services with the specified quality of service to the users and other systems. Most enterprise network management architectures use agent-manager relationship where the agents, residing on managed network/system elements, provide network/system management information such as alerts or performance measurements to the manager. The manager reacts to these messages by executing one or more actions such as operator notification, event logging, system shutdown, and automatic attempts at system repair. Management entities also poll end stations, automatically or upon user request, to check the values of certain variables. Agents have information about the managed devices in which they reside and provide that information (proactively or reactively) to management entities within one or more enterprise management systems (EMSs) via a network management protocol.  The term enterprise network management refers to the combined task of network and system management.

Network Management Functions

The functions of an enterprise manager facilitated by an Energy Management System includes:

  • Performance Management  which involves measurements of various metrics for network/system performance, analyzing  the measurements to determine normal levels, and determination of appropriate threshold values to ensure required level of performance for each service. Examples of performance metrics include network/system throughput, user response times, and line utilization. Management entities continually monitor values of the performance metrics. An alert is generated and sent to the enterprise management system when a threshold is exceeded
  • Configuration Management which involves maintaining an inventory of the network and system configuration information. This information is used to assure inter-operability and problem detection. Examples of configuration information include device/system OS name and version, types and capacity of interfaces, types and version of the protocol stacks, type and version of network/system management SW, etc.
  • Accounting Management which keeps track of usage per account, billing, and ensures resources are available according to the account requirements.
  • Fault Management detects, fixes, logs, and reports network/system problems. Fault management involves determining symptoms through measurements and monitoring, and isolating the problem.
  • Security Management which controls access to network/system resources according to security guidelines. Security manager partitions network/system resources into authorized and unauthorized areas. Users are provided access rights to one or more areas. Security managers identify sensitive network/system resources (including systems, files, and other entities) and determine accessibility of users and the resources.  Security manager monitors access points to sensitive network/system resources and log inappropriate access.

Typically, network management refers to management of network/system resources such as routers, switches, hubs, customer premises equipment and communication links. We extend the domain of enterprise management to enterprise management, defined as the set of functions needed to manage the following resources:

  1. Network resources, as defined above,
  2. Systems – Computing resources such as substation automation systems, data concentrators, servers such as Market Interface Servers, applications such as data acquisition and control systems, and database management systems,
  3. Service and business functions such as RTP customer pricing service, security and operational policy servers,
  4. Power system devices such as IEDs and RTUs,
  5. Customer premises equipment such as digital meters and consumer portals,  and
  6. Storage area networks.

Network Management Activities

Activity/Service Name [i]Activities/Services Provided [ii]

Object management  – Defining resources and attributes

EnergyManagementSystem needs to be aware of resources: routers, hubs, computers,  and their attributes.

Defining,  modifying and examining relationships

EnergyManagementSystem needs to be aware of the object relationships.

Setting, modifying and examining  attribute values

Object attributes need to have values. E.g, number & types of ports per card.

Inventory Management

IM is the task  of maintaining types and configuration of resources. The inventory information is required for SW and HW maintenance, determination of faults and recovery, and capacity planning.

Network Discovery

Dynamically creates a representation of the network topology, and configuration of the devices. The data could be collected manually, which is very tedious and often not accurate for a large network, or though an  EnergyManagementSystem. Instances of the managed devices and their internal components are created and connections are made. Components and info on the devices include network cards, ports, interfaces, power supplies, MAC addresses, SW version, OS type, CPU types, IP addresses, etc.

Address Management

Address management includes allocation IP addresses to devices, determination of subnets, keeping track of  used and available IP addresses, and reuse of  unused addresses. This task reduces addressing complexities and waste of address space.

Name Management

Naming establishes a connection between a name and a device, its location, its type,  etc. Helps identify devices, IP address mappings, etc. Naming conventions for network devices, starting from device name to individual interface, should be planned and implemented as part of the configuration standard. A well defined naming convention provides the ability to obtain accurate information when troubleshooting. The naming convention for devices can use geographical location, building name, floor, and so forth. For the interface naming convention, it can include the segment to which a port is connected, name of connecting hub, and so forth.

Routing management

Determine and configure routing tables. This includes configuring  parameters for IP routing , Quality of Service, etc.

SW distribution and upgradeThis includes detection of SW releases, distribution of new releases, and testing for interoperability.

Setting & verifying user authorization

 

Scheduling, user/flow/packet prioritization

This is to allow for a specific treatment of users, flows, or packets based on availability of features on the routers, switches and computers to meet QoS  requirements or SLA’s.

Resource dimensioning and allocation

Engineering the network elements for more efficient utilization and assurance to meet QoS. For example, sizing buffers.

Configuring for redundancies to assure reliability requirements

This is to design the network/systems to provide some tolerance to faults. For example, providing alternative routing, redundant computing, etc.

Initializing and terminating network operations, device reset.

This task is to initialize or shutdown the network and systems.

Setting values for fault threshold, health check intervals, performance thresholds

This task requires an enterprise manager to set and configure threshold values for the purpose of alarm monitoring and performance monitoring.

Polling for faults, health check, running watch-dog timers, processing traps

This task defines the function of either receiving or polling for alarms.

Log control

 

Diagnostic testing, testing capacity and special conditions

Testing to either proactively detect a failure of some device/application/element or trying to locate faults.

fault location

Determination of fault location through testing, alarm correlation, analysis, etc.

Fault data summarization

 

Reconfigure, reroute, remove Reroute

Activities to recover from fault conditions

Issue trouble ticket

Activity to document fault

Dispatch technician

 

Determining  the set of key performance indicators

The task of determining what performance metrics to measure. Examples are delay, response time, packet loss,  buffer overlflow, etc.

Mapping SLA/user perf. objectives into network/system performance  objectives

Mapping higher level service agreements such as response time, to network and system performance objectives such as processing times on each CPU, transport time, priority setting, etc.

Continuous real-time performance monitoring, performance alarm generation

Alarms, statistics, history, and host/conversation groups are  used to monitor and maintain network/system availability based on application-layer traffic. Performance metrics at the interface, device, and protocol levels are collected regularly to facilitate enterprise management, capacity planning, rerouting functions The EMSs typically collect, store, and present performance data from network devices and servers.

Examples of  performance metrics colleted are: response time,  jitter (delay variance),  packet loss, input/output queuing time, input/output buffer overflow,  transaction time,  occupancy (utilization) of resources.

Performance and statistical analysis of measured values,  Performance data summarization

Post analysis of measured performance indicators for capacity planning, traffic engineering, reconfigurations, etc.

Traffic management

Determine the traffic characteristics from each source, and their resource requirements. configure the network elements, systems, to meet the requirements. User and application traffic profiling provides a detailed view of the traffic in the network. Some EMSs allow the enterprise managers  to analyze and troubleshoot networked applications such as Web traffic, NetWare, Notes, e-mail, database access, Network File System (NFS),etc.

Capacity planning

Determine the traffic growth and plan for growth. Capacity planning for the network/system can be done following gathering of traffic statistics such as traffic amount and source and destination IP addresses,  Input and output interface numbers, TCP/UDP source port and destination ports, source and destination of administrative groups, etc.
Establishing, maintaining  and monitoring Service Level Agreements (SLA)A service level agreement (SLA) is established between a service provider and its customer on the expected performance level of network/system services. Examples of the performance  metrics used in SLA’s are : guaranteed throughput, percentage of time with service availability, packet latency, percentage of packet delivery, outage reporting time, response time  to denial of service attacks,  service activation time,  etc. Set parameters (routing, addressing, etc) in devices to meet policy requirements. Monitor operations according to the policy. Identify policy violations
Authentication and AuthorizationIdentify users before being allowed to access network/system resources.  Authorization provides various level of authority to the user.
Accounting of Security InfoCollect and report  security information used for billing, auditing, such as user identities, start and stop times, and executed commands. Accounting enables enterprise managers to track the services that users are accessing as well as the amount of network/system resources they are consuming.
Establish Access Control ListTo control access of unauthorized users to network/system resources..
Policy Management, policy specification, translation and distribution.This activity involves collection and inclusion of the various network/system  related policies into the enterprise management activities. The policies include QoS, Security, Address allocation, and routing policies. A policy management tool can assist the enterprise managers in obtaining high level policies and translating them into low level policies that are to be enforced by the network devices, or policy enforcement points. A policy repository , a database of the high and low level policies, is used by these tools.
Accounting Management Accounting management is the process used to measure network/system utilization parameters so that individual or group users on the network/system for accounting or billing. A usage-based accounting and billing system is an essential part of any service level agreement (SLA). It provides both a practical way of defining obligations under an SLA and clear consequences for behavior outside the terms of the SLA. The data can be collected via NMSs. The probes to measure the statistics are places on the edge or access routers at the point of entry to the network/system. Measuring traffic flow (number of bytes, number of packets) for a specific source-destination pair (based on IP addresses). This information can also be used to check for security violations.

Specifying accounting information to be collected

 

Setting and modifying accounting limits

 

Defining accounting metrics

 

Implementing/activating metering functions

 

Controlling the storage of and access to accounting information

 

Monitoring usage

 

Regulating users and groups

 

Billing

 

Reporting

Report accounting information, configuration status, fault data,  performance data , policy changes and violations


[i] The Service Name corresponds to a Use Case which is associated with the main domain template use case using the <<includes>> relationship.

[ii] The Service Description corresponds to the documentation attribute of a Use Case.