Method For Dynamic Discovery of Control Plane Resources and Services

An apparatus comprising a processor configured to discover one or more peer processors associated with a network component in a dynamic manner by detecting an announcement message from a peer processor, wherein the announcement message is multicast from the peer processor when the peer processor is added or activated on the network component.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. Provisional Patent Application No. 61/321,066 filed Apr. 5, 2010 by Renwei Li, et al. and entitled “In-service Process Migration and Virtual Router Migration,” and U.S. Provisional Patent Application No. 61/324,610 filed Apr. 15, 2010 by Renwei Li, et al. and entitled “In-service Process Migration and Virtual Router Migration,” both of which are incorporated herein by reference as if reproduced in their entirety.

The present application is related to commonly assigned U.S. patent application Ser. No. ______ (Atty. Docket No. 4194-38700) filed even date herewith by Randall Stewart, et al. and entitled “Method for Dynamic Migration of Process or Service from One Control Plane Processor to Another” and to commonly assigned U.S. patent application Ser. No. ______ (Atty. Docket No. 4194-38800) filed even date herewith by Randall Stewart, et al. and entitled “Method for Dynamic On Demand Startup of a Process or Resource,” both of which are incorporated herein by reference as if reproduced in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not applicable.

REFERENCE TO A MICROFICHE APPENDIX

Not applicable.

BACKGROUND

Typically, routing and switching platforms comprise about one or about two control plane processors, e.g., route processor (RP) cards. In today's modern hardware, more processors are being added to “scale up” a router or switch to handle more networking traffic. For example, more than about three processors may be used in network components by adding or plugging additional line processor (LP) cards. The quantity of processors used in the network components may further increase as the cost of processors (e.g. LP cards) decreases. Typically, adding an additional processor to network components requires manually configuring and setting up the processes and services that run on each new processor. This may also require reconfiguring existing processors and sometimes removing some processes that were previously running on the existing processors. This methodology requires human intervention, e.g., to perform manual configuration, which can be prone to error, is not dynamic or flexible, and causes the configuration of network components and processes to be substantially fixed or static.

SUMMARY

In one embodiment, the disclosure includes an apparatus. The apparatus comprises a processor configured to discover one or more peer processors associated with a network component in a dynamic manner by detecting an announcement message from a peer processor, wherein the announcement message is multicast from the peer processor when the peer processor is added or activated on the network component.

In another embodiment, the disclosure includes a network component. The network component comprises a processor configured to multicast an announcement of its presence and properties to at least one peer processor associated with the network component, wherein the announcement is multicast from the processor automatically when the processor is added or activated on the network component.

In a third aspect, the disclosure includes a method. The method comprises receiving an announcement message from a peer processor; sending a service request message to the peer processor; recording a list of services for the peer processor if the peer processor returns a service reply message that comprises the list of services; marking the peer processor as active if the peer processor returns a service reply message and marking the peer processor as inactive if the peer processor does not return a service reply message.

These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

FIG. 1 is a schematic diagram of an embodiment of a plurality of associated processes.

FIG. 2 is a schematic diagram of an embodiment of a plurality of associated processors.

FIG. 3 is a schematic diagram of an embodiment of an announcement message.

FIG. 4 is a schematic diagram of an embodiment of a service list request.

FIG. 5 is a schematic diagram of an embodiment of a service list reply.

FIG. 6 is a flowchart of an embodiment of a dynamic discovery method.

FIG. 7 is a schematic diagram of an embodiment of a transmitter/receiver unit.

FIG. 8 is a schematic diagram of an embodiment of a general-purpose computer system.

DETAILED DESCRIPTION

It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

Disclosed herein is a system and method for supporting the dynamic configuration of processes on a plurality of processors. The processors may be control plane processors, such as RP cards and/or LP cards, of which at least some may be added (or plugged), removed, or replaced in a network component. The method may support the dynamic configuration of one or more processes on one or more processors, e.g., when adding, removing, and/or changing one or more processors in the system. The processes may comprise a process manager (PM) that may be located in a processor and configured to start or initiate one or more services or processes on the processor. The PM may also be configured to dynamically discover other processors in the system, which may be added or reconfigured, and any processes, services, and/or resources running on the processors. The PM may determine which processors are available, what processes/services are running on which processors, including what duplicate services/processes are running on multiple processors, such as shared or distributed services. The PM may also determine and detect dynamically the aggregate load of the system, e.g., for a plurality of available processors, and if any of the processors are under utilized.

FIG. 1 illustrates an embodiment of a plurality of associated processes 100, which may be located on a processor 110. The processor 110 may be a static control plane processor, e.g., a RP card, or a removable control plane processor, e.g., a LP card, which may be located in a network component, e.g. a switch or a router, a data server, or any other network component. The processor 110 may be any computation element (CE) that is configured to implement or run one or more processes 100, which may handle services or functions related to the processor's operations. Specifically, the processes 100 may comprise a PM 112 that is configured to start up and manage other processes and/or services on the processor 110. The processes 100 may also comprise other processes that may run on the processor 110 and provide a plurality of functions/services, such as for implementing a border gateway protocol (BGP) 120, a routing information base (RIB) 122, an Intermediate System to Intermediate System (IS-IS) protocol 124, a virtual private network (VPN) 128, a Directory Service (DS) 116, a Command Line Interface (CLI) 126, a Configure Daemon (CNFD) 114, a Resource Manager (RSM+) 118, or combinations thereof.

The PM 112 may manage initial system startup of other processes and monitor the other running processes 114, 116, 118, 120, 122, 124, 126, and 128 on the processor 110. The PM 112 may monitor the other running processes 114, 116, 118, 120, 122, 124, 126, and 128 to verify that the processes are not failing, e.g., due to lack of resources, and are running appropriately. The PM 112 may also restart any failed process or kill any process, such as a run-away process, e.g., to save resources. Additionally, the PM 112 may be configured to communicate with other processes, e.g., other PMs, that run on other processors (not shown) using an Internal Router Capability Protocol (IRCP). IRCP allows existing software to automatically find new hardware elements and discover their capabilities. Furthermore IRCP allows older entities to migrate load to new entities thereby automatically using the new processing capacity which simplifies both deployment and administration while providing a high degree of scalability and availability. The other processors may be similar to the processor 110 and may be located on the same network component of the processor 110, e.g., on the same switch or router.

The IRCP may be configured to allow the dynamic configuration of processes, including PMs, on one or more processors in a network component, e.g., for routing and/or switching operations. The IRCP may allow a network administrator to add a new CE or multiple CEs to a network component, e.g., to expand routing and switching capabilities. The IRCP may allow the software existing on the network component to automatically find new hardware elements and discover their capabilities. Further, the IRCP may allow older CEs to migrate load (e.g., processes and/or services) to the new CEs, e.g., in a dynamic or automatic manner, to make use of the new processing capacity due to adding the new CEs. As such, the IRCP may simplify network deployment and/or administration while providing an improved degree of scalability and availability, e.g., in comparison to current static configuration schemes.

FIG. 2 illustrates an embodiment of a plurality of associated processors 200, which may be located on a network component, such as a switch or a router. The processors 200 may comprise a first processor 210, a second processor 220, and a third processor 230, which may communicate with one another. The first processor 210, the second processor 220, and the third processor 230 may be CEs that include RP cards and/or LP cards. The first processor 210, the second processor 220, and the third processor 230 may comprise a first PM 212, a second PM 222, and a third PM 232, respectively, which may be configured similar to the PM 112.

The processors 200 may use the IRCP to dynamically learn about one another, request start services from one another, migrate services or part of services between one another, and/or distribute load information between them. The IRCP may be implemented on a separate process in each of the processors 200 or as part of other services. In an embodiment, the IRCP may be implemented by the first PM 212, the second PM 222, and the third PM 232. The first PM 212, the second PM 222, and the third PM 232 may use the IRCP to dynamically discover one another, any processes of the processors 200, and their capabilities. The first PM 212, the second PM 222, and the third PM 232 may also use the IRCP to exchange services, change loads between the processors 200, and migrate processes/services between the processors 200.

The IRCP may allow any of the first PM 212, the second PM 222, and the third PM 232 to announce its presence to other PMs using User Datagram Protocol (UDP) signaling over a multicast channel. The announcement may enable the processors 200 to dynamically discover each other and any additionally added processors, e.g., over time. For instance, the announcement may be signaled or sent automatically when the first PM 212, the second PM 222, or the third PM 232 is added or activated. The first PM 212, the second PM 222, and the third PM 232 may also use the IRCP to send and/or receive other information or messages, for instance using a Stream Control Transmission Protocol (SCTP). The messages may comprise a Payload Protocol Identifier of about 28, which may be assigned by the Internet Assigned Numbers Authority (IANA).

Any one of the first PM 212, the second PM 222, and the third PM 232 may be an IRCP sender or speaker that implements an announcement procedure. Accordingly, the IRCP sender or speaker may send an announcement message within a time window equal to a sum of a determined announcement period and an additional random jitter from about zero to about four seconds. The IRCP may send the announcement message to announce its presence to one or more IRCP receivers, e.g., one or more other PMs on the processors 200. The IRCP speaker may send a plurality of announcement messages within a plurality of corresponding subsequent time windows to one or more IRCP receivers. The announcement period may be determined by an IRCP.ANNOUNCEMENT timer that may be set by a network administrator or management system. The announcement message may comprise information that indicates the IRCP speaker, the load of the IRCP speaker, the processes on the IRCP speaker, and/or the availability of the IRCP speaker to participate in process migration.

Each announcement period, the IRCP speaker or sender may determine if it is able to receive some load from its peers (e.g., any other PM on the processors 200). To determine whether the IRCP speaker is able to receive a load from its peers, the IRCP speaker or another process may maintain a plurality of history fields for time usages of load samples on the processor of the IRCP speaker. A load sample may correspond to a process/service or a part of a process/service. The history fields may be stored in a plurality of IRCP.HISTORY.SIZE records or files. Initially, the history fields may be set to about zero. For each load sample, a central processing unit (CPU) time usage may be recorded in a corresponding history field, e.g., both in terms of idle time and busy time. The values of the idle time and busy time may be combined to form a percentage of idle time versus total time.

The percentage of idle time may be compared against a low water mark value for CPU utilization. The low water mark may correspond to a value IRCP.CPU-LOW-WATER that may be set by the network administrator or the management system. If the percentage of idle time is greater than or equal to about the low water mark, then the corresponding CPU interval may indicate that the processor may accept a new load sample. However, a new load sample may be accepted if the IRCP.HISTORY.SIZE records for all the load samples on the processor indicate that the processor is available to receive or take on a new load sample. If a new load may be accepted, the IRCP sender may check the memory utilization of the processor. The memory utilization may correspond to a percentage of free memory available versus the total amount of memory. The memory utilization may be compared against a low water mark value for memory utilization. The low water mark may correspond to a value IRCP.MEM-LOW-WATER that may be set by the network administrator or the management system. If the percentage of memory available is greater than or about equal to the low water mark, then the processor may be available to take on more load samples, which may be indicated in the outgoing announcement message from the IRCP speaker.

An IRCP receiver, e.g., a PM on one of the processors 200, may receive the announcement message from the IRCP sender and perform a plurality of steps depending on whether the IRCP sender is a new added processor or an existing processor. The IRCP receiver may maintain a mapping of host identity fields in a received announcement message to peer control blocks for peer processors. When an announcement message is received, the IRCP receiver may search in the mapping to determine whether the announcement message corresponds to an existing or new peer, e.g., based on a host identity indicated in the message.

If the announcement message corresponds to a new peer processor, the IRCP receiver may not find a match in the mapping. In this case, the IRCP receiver may use a shared secret and a first about 36 bytes in the announcement message to generate a signature (SHA-1 signature) for the IRCP sender. The first about 36 bytes may be subsequent to the shared secret in the announcement message. The generated signature may comprise about 20 bytes. The generated signature may then be compared to a signature indicated in the announcement message. The two signatures may be compared, e.g., byte by byte, and if the two signatures do not match, then the announcement message may be discarded and the processing may be stopped.

If the two signatures match, then a new peer control block (PCB) may be created at the IRCP receiver. The PCB may comprise a plurality of relevant fields from the announcement message, which may be used to track the peer's capabilities and its availability. The PCB may also comprise the time of last update of the PCB, which may be set to the current time of updating. The PCB may also comprise the sequence number and the signature (SHA-1 signature) indicated in the announcement message. The signature may be added to the PCB to avoid recalculating the signature for future announcement messages for the same peer. The new PCB may then be entered into a host identification to peer PCB mapping table, which may be kept in a HASH table that is hashed on an about 64 bit host identification. Subsequently, the sequence number in the PCB may be updated to be about one less than the sequence number in the announcement message. The peer may also be marked as being active and needing a service update. Next, the sequence number stored in the PCB may be compared to the sequence number in the announcement message. If the sequence number in the announcement message is less than the sequence number in the PCB, then the message may be an old or outdated message and hence discarded. Otherwise, the sequence number in the PCB may be updated or replaced with the sequence number in the message and the processing proceeds.

Alternatively, if the announcement message corresponds to an existing peer processor that is marked as active, then the hash inside the PCB of the processor may be compared to the hash in the announcement message. If the two hash values do not match, then the message may be ignored without further processing. If the two hash values match, then the sequence number stored in the PCB may be compared to the sequence number in the announcement message. If the sequence number in the announcement message is less than the sequence number in the PCB, then the message may be an old or outdated message and hence discarded. Otherwise, the sequence number in the PCB may be updated or replaced with the sequence number in the message and the processing proceeds.

If the announcement message corresponds to an existing peer processor that is marked as inactive, then a new signature may be calculated using the shared secret. The new signature may be compared to the signature in the announcement message. If the two signatures do not match, then the message may be discarded without further processing. Otherwise, the signature may be updated in the PCB of the peer processor. The sequence number may also be set to about one minus the sequence number in the announcement message. The peer may also be marked as being active and needing a service list update. Next, the sequence number stored in the PCB may be compared to the sequence number in the announcement message. If the sequence number in the announcement message is less than the sequence number in the PCB, then the message may be an old or outdated message and hence discarded. Otherwise, the sequence number in the PCB may be updated or replaced with the sequence number in the message and the processing proceeds. If the peer is marked as needing a service list update or exchange, then a service exchange procedure may be started, as described below. The relevant information about the peer may then be updated in the PCB, including availability, its CPU and memory characteristics, and last statistics. The peer's last announcement timestamp may then be updated.

When one or more new peer processors discover each other or when a peer has been inactive for some time, a service exchange procedure may be needed. The service exchange may be started when a peer receives an announcement message. A first peer (e.g., PM, process, or processor) may send or multicast a service list request message, which may comprise an embedded IRCP announcement message. The embedded IRCP announcement message may allow a second peer (e.g., PM, process, or processor) that receives the announcement to build a PCB for the first peer before responding to the service list request message. After sending the request message, the first peer may not start a timer or other mechanism to track the peer. Instead, the peer may rely on a SCTP layer and watch for a notification of failure. The SCTP layer may be monitored by an IRCP agent, which may be a process running on the peer. For instance, the IRCP agent may be the PM or a separate process on the peer. If an SCTP message is not received or provides a SCTP notification that indicates a failed association or setup, then the second peer may be marked as inactive, e.g., by the IRCP agent. A plurality of exchanges, except the announcement message, may be sent via SCTP to the same port number used for the UDP announcement messages, which may be assigned the port number 5050 or any other port number. The SCTP messages may use the Per Protocol Identification field, which may be set to the IANA assigned value for IRCP at about 28. The IRCP agent may monitor failure notifications on the SCTP layer. When a detected SCTP failure notification indicates that the underlying SCTP association has failed, the receiver of the notification, e.g., the IRCP agent, may mark the second peer as inactive.

Upon receiving a service list request, the second peer may process the embedded announcement message in the request as if it had just arrived via the multicast channel. The processing may comprise creating the first peer PCB and/or updating its current load statistics. When processing the announcement message as a normal announcement, if the second peer does not have a service list from the first peer, a new request for a service list may be sent to the first peer during the processing. The second peer may then look up the first peer PCB and if the first peer PCB is not found, then the second peer may stop processing the message. This scenario may occur if the signature in the service list request announcement portion does not match the calculated signature. The second peer may respond to the first peer service list request with a service list response message. The message may comprise the number of services and a set of American Standard Code for Information Interchange (ASCII) services each separated by a set of NULL (e.g., zero) characters.

Upon receiving the service list response message, the first peer may look up the second peer PCB. If the second peer PCB does not exist, the first peer may discard the message and stop the processing. If a list of services is sent with the second peer PCB, the first peer may discard the old list. The first peer may then parse the list of services and record the services and the total count within the second peer PCB. After the service list exchange procedure above is completed, both peers may have a complete map of the services that may be run or implemented by each other.

FIG. 3 illustrates an embodiment of an announcement message 300, which may be sent from the first PM 212, the second PM 222, the third PM 232, or their corresponding processors for dynamic discovery. The announcement message 300 may comprise a message type field 302, a plurality of flags field 304, a length field 306, a host identity field 308, a release field 310, a version field 312, a number of CPUs field 314, a Hertz field 316, a CPU frequency field 318, a statistical (stat) hertz (Hz) clock field 320, a CPU stat use field 322, an announcement (ann) sequence number field 324, a total system memory field 326, an idle stat use field 328, a millisecond passed (millisec-passed) field 330, an available memory field 332, a can take more load field 334, a local software version number field 336, and a signature (SHA One or SHA-1) digest field 338.

The message type field 302 may comprise about 32 bits and have a value of about one that indicates the type of the announcement message 300. The flags field 304 may comprise about 16 bits, set to a value of about zero by the sender, and ignored by the receiver. The length field 306 may comprise about 16 bits and indicate the length of the announcement message 300, e.g., in bytes, which may be equal to about 120 bytes. The host identity field 308 may comprise about 64 bits and represent the identity of the sending host or peer. The host identity field 308 may have a value that is generated by selecting the lowest machine address of a plurality of sender's Ethernet interfaces. The release field 310 may comprise about 32 bits and indicate the release number of the sender's IRCP system, which may be equal to about one.

The version field 312 may comprise about 32 bits and represent the version number of the sender's IRCP system, which may be equal to about one. The number of CPUs field 314 may comprise about 32 bits and indicate the quantity of CPUs that the sender CE has in its CPU complex, such as in a multiple core processor. This value may be equal to about one for single core systems. The hertz field 316 may comprise about 32 bits and indicate the hertz value used by the sending CPU complex. The hertz value may be configurable based on the system. For example, the hertz value may be equal to about 1000 for some current systems. The CPU frequency field 318 may comprise about 32 bits and indicate the frequency of the sending CPU complex. The value in the CPU frequency field 318 may be divided by about 1,000,000. For example, the value may be equal to about 3,013 for an about 3,013,726,115 hertz CPU frequency. The stat Hz clock field 320 may comprise about 32 bits and indicate the statistical clock frequency associated with the sending CPU complex. This value may indicate how many CPU ticks are present in each of the idle and CPU statistical samplings.

The CPU stat use field 322 may comprise about 32 bits and indicate the amount of CPU that was used in stat Hz ticks during the sampling period between the last two announcements. The ann sequence number 324 may comprise about 32 bits and indicate the sequence number of the announcement message 300. The sequence number may be monotonically increased, e.g., incremented by about one, for each subsequently transmitted announcement message. The total system memory field 326 may comprise about 32 bits and indicate the amount of memory the sender CPU complex has in total. The amount of memory may be indicated in multiples of about 1,024 bytes. For example, an about one Gigabyte may be indicated by a value of about 1,048,576. The idle stat use field 328 may comprise about 32 bits and indicate the amount of stat Hz ticks during the sampling period for which the CPU was idle. The millisec-passed field 330 may comprise about 32 bits and indicate the number of milliseconds that passed since the last announcement was sent.

The available memory field 332 may comprise about 32 bits and indicate the amount of memory that is currently available in about 1,024 byte increments. For example, an about one Gigabyte may be indicated by a value of about 1,048,576. The can take more load field 334 may comprise about 32 bits and comprise a Boolean value that may be set to a non-zero value to indicate that the sender may be available to take on more workload. If the value is equal to about zero, then the sender may not be able to take on more workload. The local software version number field 336 may comprise about 32 bytes and comprise an ASCII string that indicates the local system's software version number (e.g., ‘release1-1.1’). Any unused bytes in this filed may be set to about zero. The SHA-1 digest field 338 may comprise about 20 bytes and comprise the SHA-1 signature digest of the sender.

During an announcement procedure, the IRCP sender or speaker (e.g., peer processor or corresponding PM) may configure the first three fields of the announcement message 300 (e.g., the message type field 302, the flags field 304, and the length field 306) that may represent the message header. The three fields may be configured to allow a receiver of the message to identify the incoming message. The message type in the message type field 302 may be set to about one. The flags in the flags field 304 may not be used and may be set to about zero. The length field 306 may be set to indicate a length value of about 120 bytes for the announcement message 300.

The host identity field 308 may be filled in with a unique host identity. This unique host identity may be a constant 64 bit value that does not change from the time the CE first comes online until it is shutdown. This number or value may be unique with respect to other CE's in the system. In most cases, the CE's Ethernet network interface cards may be examined and the lowest Media Access Control (MAC) address assigned to the CE may be used, e.g., prefixing the upper about two bytes with about zero. This may assure the CE of holding a unique and permanent identity (unless an interface card is removed). Other mechanisms, such as manual configuration or internal CPU unique identity may be used, as long as the generated host identity is unique.

The release field 310 may be filled in with a version number of about one. The version field 312 may be filled in with the same version number of about one. The number of CPUs field 314 may be filled with values from about one to about 128 for current processor technology, but may also comprise larger values for future systems. The hertz field 316 may be filled in with the system's hertz value, such as about 1,000 for current operating systems, which may be used to control the granularity of timers that the sending system generates. The CPU frequency field 318 may be filled in with a frequency value of each running CPU (in a CPU complex) in millions of hertz, e.g., in Megahertz. For example, the CPU frequency field 318 may be assigned a value of about 2,791 to indicate about 2.791 Gigahertz.

Typically, operating systems may not keep track of statistics on processes in terms of hertz. Instead, a sampling interval may be indicated in the stat Hz clock field 320. The stat Hz clock field 320 may be used to report CPU and idle usage. For example, if the stat Hz clock is about 127 and the value in the hertz field 316 is about 1000, then in a period of about 60,000 milliseconds, about 7,620 stat Hz clocks may be accounted for in the announcement for each CPU. The CPU stat use field 322 may be filled in with the total stat Hz intervals during which the CPUs were busy since the last announcement. The value in the CPU stat field 322 may be the combined value of a plurality of CPUs or core processors in a CE.

The ann sequence number field 324 may be filled in with a unique 32 bit sequence number, which may be incremented after each announcement is generated. The 32 bit sequence number may be a 32 bit unsigned arithmetic. A receiver may examine each arriving announcement message 300 and compare the last sequence number in the last message. Since the announcement data may be multicast, the receiver may detect the same announcement on multiple interfaces, and thus process only one announcement. The total system memory field 326 may be filled in with the value of the total CE system memory in about one kilobyte increments.

The idle stat use field 328 may be filled in with the number of stat Hz ticks of which the CPU were idle since the last announcement. The millisec-passed field 330 may be filled in with the total number of milliseconds that have passed since the last announcement was sent. The can take more load field 334 may be filled in with a value that indicates to the receiver whether the sender is capable of accepting migration load to its CE. A non-zero value may indicate that the sender is willing to accept load. A zero value may indicate that the sender is not willing to accept load. The local software version number field 336 may be filled in with a local software version number, which may be an ASCII string.

Each IRCP participant (e.g., peer processor or PM) may hold a shared secret that other peer IRCP senders may know. This shared secret is used to create a SHA-1 signature for the first about 36 bytes of the IRCP announcement message. The shared secret may be used by a sender to create a static signature that may be placed in the SHA-1 digest field 338 in its outgoing message. A receiver may also be able to store this signature and perform initial computation when a new peer announcement arrives. An announcement from a peer with an invalid SHA-1 signature may be ignored and any subsequent messages from such a peer sent using SCTP may be also ignored, until a valid SHA-1 signature is detected in a subsequent peer's announcement message.

FIG. 4 illustrates an embodiment of a service list request 400, which may be sent from the first PM 212, the second PM 222, the third PM 232, or corresponding processors for dynamic discovery. The service list request 400 may comprise a message type field 402, a plurality of flags field 404, a length field 406, an embedded announcement 408, and a host identity field 410. The message type field 402 may comprise about 32 bits and have a value of about two that indicates the type of the service list request 400. The flags field 404 may comprise about 16 bits, set to a value of about zero by the sender, and ignored by the receiver. The length field 406 may comprise about 16 bits and indicate the length of the service list request 400, e.g., in bytes, which may be equal to about 136 bytes. The embedded announcement 408 may comprise about 120 bytes that correspond to an announcement message, such as the announcement message 300. The announcement may be sent embedded in the service list request 400, e.g., if the receiver has not previously received the announcement message from the sender of the request. The receiver may use the embedded announcement 408 to generate a representation for the peer in its local peer database. The host identity field 410 may comprise about 64 bits and represent the identity of the sending host or peer. The host identity field 410 may have the lowest machine address of a plurality of sender's Ethernet interfaces.

FIG. 5 is illustrates an embodiment of a service list reply 500, which may be sent from the first PM 212, the second PM 222, the third PM 232, or their corresponding processors for dynamic discovery. The service list reply 500 may comprise a message type field 502, a plurality of flags field 504, a length field 506, a host identity field 508, a number of services field 510, and a service names field 512. The message type field 502 may comprise about 32 bits and have a value of about three that indicates the type of the service list reply 500. The flags field 504 may comprise about 16 bits, set to a value of about zero by the sender, and ignored by the receiver. The length field 506 may comprise about 16 bits and indicate the length of the service list reply 500, e.g., in bytes, which may vary depending on the data in the service list reply 500. The host identity field 508 may comprise about 64 bits and represent the identity of the sending host or peer. The host identity field 508 may have the lowest machine address of a plurality of sender's Ethernet interfaces. The number of services field 510 may comprise about 32 bits and comprise the number of null terminated service names that the sender has placed in the service names field. The service names field 512 may comprise a list of services or processes that may be running on the sender of the service list reply 500. Each service in the list may be represented by an ASCII string that comprises the service name followed by a NULL (or zero) character. For example, two services, foo and bar, may be represented by about eight bytes as follows “foo0bar0”. The service names field 512 may have a variable size depending on the quantity of listed service names.

FIG. 6 is illustrates an embodiment of a dynamic discovery method 600, which may be implemented by the first PM 212, the second PM 222, the third PM 232, or their corresponding processors for dynamic discovery. The method 600 may be implemented using the IRCP and used to discover a plurality of peers, e.g., peer CEs or peer PMs, such as in a multicast group. Specifically, a peer CE or PM may announce itself to other peers, dynamically discover other peers, and determine a plurality of services associated with the other peers. The method 600 may also allow the peer CE or PM to determine whether any other peer may be available to take on processes.

The method 600 may start at block 610, where a multicast group may be joined. For instance, a new CE (e.g., LP card) may be added to a network component (e.g., router) and hence may start up and join a multicast group of a plurality of CEs on the network component. The PM of the new CE may be initiated and may join the group using Internet Protocol (IP) multicast or other multicast protocols, e.g., at the second network layer (Layer 2). At block 612, announcements on the multicast channel may be listened to. The PM of the new CE may listen on the multicast channel (e.g., IP multicast channel) to any announcements sent from any of the other CEs in the group.

At block 614, an announcement may be sent on the multicast channel. The PM may send an announcement message, such as the announcement message 300, on the multicast channel to the CEs in the group to announce its presence. The PM may send the announcement a plurality of times, e.g., periodically, on the multicast channel. The announcement may comprise information and various properties about the CE, including load, capabilities, CPU capacity, memory usage, availability to take on processes, and/or other information indicated in the announcement message 300.

At block 616, the method 600 may determine whether an announcement is received. If an announcement is received or detected, the method 600 may proceed to block 618. Otherwise, the method 600 may proceed to block 630. At block 618, a peer that sent the announcement may be marked as active. The peer may be a new peer or an activated peer that corresponds to one of the other CEs in the group. At block 620, information about the peer may be recorded and/or updated. The information may be received in an announcement message, such as the announcement message 300, from a peer CE. The information may comprise various properties about the peer CE, including load, capabilities, CPU capacity, memory usage, availability to take on processes, and/or other information indicated in the announcement message 300.

At block 622, a service request may be sent to the peer. A service request message, such as the service request message 400, may be sent to the peer CE that announced itself. The service request may be sent, e.g., to the PM of the peer CE, to request a list of available or running processes/services on the peer CE. At step 624, the method 600 may determine whether a service reply is received. If a service reply is received from the peer CE, then the method 600 may proceed to block 626. Otherwise, the method may proceed to block 640. At block 626, services and statistics of the peer may be recorded. The peer CE may send a reply in a service reply message, such as the service reply message 500, which may comprise a list of services running on the peer CE. The services indicated in the service reply may include duplicate services/processes that are running on multiple processors, such as shared or distributed services. The receiving peer may then record the list of services in the service reply with the statistics of the peer CE indicated in the received announcement message. Thus, the receiving peer may have knowledge about services that run or may run on the peer CE. The exchange of service request and service reply between the peers may be implemented using SCTP or any other reliable transport protocol, such as Transmission Control Protocol (TCP). The method may then return to block 612.

At block 630, the method 600 may determine whether the number of announcements sent to a peer exceeded a limit. If the number of announcements sent to the peer has reached a limit or threshold, e.g., about two or about three repeated announcements, without receiving a response in return, the method 600 may proceed to block 640. Otherwise, the method may return to block 612. At block 640, the peer may be marked as inactive, for instance if the peer does not respond to a repeated announcement message or a service request message. The method 600 may also comprise additional steps (not shown) to declare its running processes/services. For instance, the PM may receive a service request from a peer, and may send a service reply to the peer that comprises a list of running services/processes. These steps may comprise similar details as described for blocks 622 and 626.

FIG. 7 illustrates an embodiment of a transmitter/receiver unit 700, which may be any device that transports packets through a network. For instance, the transmitter/receiver unit 700 may be located in a network component, such as a router or a switch. The transmitted/receiver unit 700 may comprise one or more ingress ports or units 710 for receiving packets, objects, or Type Length Values (TLVs) from other network components, logic circuitry 720 to determine which network components to send the packets to, and one or more egress ports or units 730 for transmitting frames to the other network components.

The network components and/or methods described above may be implemented on any general-purpose network component, such as a computer or network component with sufficient processing power, memory resources, and network throughput capability to handle the necessary workload placed upon it. FIG. 8 illustrates a typical, general-purpose network component 800 suitable for implementing one or more embodiments of the components disclosed herein. The network component 800 includes a processor 802 (which may be referred to as a central processor unit or CPU) that is in communication with memory devices including second storage 804, read only memory (ROM) 806, random access memory (RAM) 808, input/output (I/O) devices 810, and network connectivity devices 812. The processor 802 may be implemented as one or more CPU chips, or may be part of one or more application specific integrated circuits (ASICs).

The second storage 804 is typically comprised of one or more disk drives or tape drives and is used for non-volatile storage of data and as an over-flow data storage device if RAM 808 is not large enough to hold all working data. Second storage 804 may be used to store programs that are loaded into RAM 808 when such programs are selected for execution. The ROM 806 is used to store instructions and perhaps data that are read during program execution. ROM 806 is a non-volatile memory device that typically has a small memory capacity relative to the larger memory capacity of second storage 804. The RAM 808 is used to store volatile data and perhaps to store instructions. Access to both ROM 806 and RAM 808 is typically faster than to second storage 804.

At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, R1, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=R1+k*(Ku−R1), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 7 percent, . . . , 70 percent, 71 percent, 72 percent, . . . , 97 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. Accordingly, the scope of protection is not limited by the description set out above but is defined by the claims that follow, that scope including all equivalents of the subject matter of the claims. Each and every claim is incorporated as further disclosure into the specification and the claims are embodiment(s) of the present disclosure. The discussion of a reference in the disclosure is not an admission that it is prior art, especially any reference that has a publication date after the priority date of this application. The disclosure of all patents, patent applications, and publications cited in the disclosure are hereby incorporated by reference, to the extent that they provide exemplary, procedural, or other details supplementary to the disclosure.

While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims

1. An apparatus comprising:

a processor configured to discover one or more peer processors associated with a network component in a dynamic manner by detecting an announcement message from a peer processor,
wherein the announcement message is multicast from the peer processor when the peer processor is added or activated on the network component.

2. The apparatus of claim 1, wherein the announcement message comprises properties about the peer processor and wherein the properties comprise at least one of load, capabilities, processing capacity, memory usage, and availability to take on processes.

3. The apparatus of claim 1, wherein the announcement message is configured and communicated according to an Internal Router Capability Protocol (IRCP) that allows the processor to perform at least one of communicate with the peer processor, detect a new peer processor, discover capabilities of the new peer processor, learn about at least one running service on at least one of the peer processor and the new peer processor, and migrate at least one of a load, a process, and a service to at least one of the peer processor and the new peer processor.

4. The apparatus of claim 3, wherein a process manager (PM) is configured to run on the processor, manage startup, monitor at least one other process on the processor, restart a failed process on the processor, and communicate using IRCP with a peer PM that runs on the peer processor.

5. The apparatus of claim 3, wherein the announcement message is about 120 bytes in size and comprises a message type of about one, a plurality of flags set to about zero, a length field that indicate the size of the announcement message, and a host identity that indicates the peer processor.

6. The apparatus of claim 3, wherein the announcement message comprises a host identity that indicates the processor and corresponds to a lowest machine address of a plurality of Ethernet interfaces, a version of the IRCP system, a number of central processing units (CPUs) in the processor, a hertz value used by the CPUs, a CPU frequency value divided by about 1,000,000, a statistical (stat) hertz (Hz) clock value that indicates a statistical clock frequency associated with the CPUs, a CPU stat use that indicates the amount of CPU that was used in stat Hz ticks during a sampling period between two subsequently sent announcements, an announcement (ann) sequence number that indicates a sequence number of the announcement message, a total system memory value that indicates in multiples of about 1,024 bytes an amount of memory the CPUs have, and an idle stat use that indicates an amount of stat Hz ticks during the sampling period for which a CPUs was idle.

7. The apparatus of claim 3, wherein the announcement message comprises a millisecond-passed value that indicates a number of milliseconds that passed since a last announcement was sent, an available memory value that indicates an amount of memory that is currently available in about 1,024 byte increments, a can take more load indicator that is set to about a non-zero value to indicate that the processor is available to take on more workload, a local software version number that indicates a local system's software version number in American Standard Code for Information Interchange (ASCII) string format, and a signature digest that comprises about 20 bytes signature value.

8. The apparatus of claim 1, wherein the processor updates a peer control block (PCB) for the peer processor if the peer processor is an existing processor or creates a new PCB for the peer processor if the peer processor is newly added, and wherein the PCB indicates the peer processor's capabilities and availability.

9. The apparatus of claim 8, wherein the PCB is updated or created if a signature in the announcement matches a calculated signature.

10. The apparatus of claim 8, wherein the processor maintains a host identification to peer PCB mapping table that is hashed on an about 64 bit host identification.

11. A network component comprising:

a processor configured to multicast an announcement of its presence and properties to at least one peer processor associated with the network component,
wherein the announcement is multicast from the processor automatically when the processor is added or activated on the network component.

12. The network component of claim 11, wherein a process manager (PM) that runs on the processor signals the announcement to at least one peer PM on the at least one peer processor over a multicast channel using a User Datagram Protocol (UDP).

13. The network component of claim 11, wherein the announcement message is multicast at a plurality of subsequent time windows, wherein each of the subsequent time windows is about equal to a determined announcement period and an additional random jitter time, and wherein the random jitter time ranges from about zero seconds to about four seconds.

14. The network component of claim 13, wherein the announcement indicates whether the processor is available to take on load from the peer processors, and wherein the processor determines at each time window if the processor is available to take on load when a percentage of memory available at the processor is greater than or about equal to a determined low water mark value.

15. A method comprising:

receiving an announcement message from a peer processor;
sending a service request message to the peer processor;
recording a list of services for the peer processor if the peer processor returns a service reply message that comprises the list of services;
marking the peer processor as active if the peer processor returns a service reply message; and
marking the peer processor as inactive if the peer processor does not return a service reply message.

16. The method of claim 15 wherein the peer processor is a first peer processor and further comprising:

sending an announcement message to a multicast group of peer processors;
marking a second peer processor from the multicast group of peer processors as active if a service request message is received from the second peer processor;
sending a service reply message to the second peer processor if the service request message is received from the second peer processor; and
marking the second peer processor as inactive if a service request message is not received from the second peer processor after retransmitting the announcement message a plurality of times to the second peer processor.

17. The method of claim 15, wherein the service request message comprises a message type of about two, a plurality of flags set to about zero, a length field that indicate the size of the service request message, an embedded announcement message, and a host identity that indicates the identity of a sender of the announcement message.

18. The method of claim 15, wherein the service reply message comprises a message type of about three, a plurality of flags set to about zero, a length field that indicates the size of the service request message, a number of services that run on a sender of the announcement message, and a list of service names that run on the sender.

19. The method of claim 15, wherein at least one of the service request message and the service reply message is transported using a Stream Control Transmission Protocol (SCTP).

20. The method of claim 19, wherein an Internal Router Capability Protocol (IRCP) agent monitors a SCTP layer for a notification failure for transporting the service reply message.

Patent History
Publication number: 20120136944
Type: Application
Filed: Apr 5, 2011
Publication Date: May 31, 2012
Applicant: FUTUREWEI TECHNOLOGIES, INC. (Plano, TX)
Inventors: Randall Stewart (Chapin, SC), Renwei Li (Fremont, CA), Xuesong Dong (Pleasonton, CA), Hongtao Yin (Fremont, CA), Huaimo Chen (Bolton, MA), Robert Tao (San Jose, CA), Yang Yu (San Ramon, CA), Weiqian Dai (San Jose, CA), Ming Li (Cupertino, CA)
Application Number: 13/080,172
Classifications
Current U.S. Class: Demand Based Messaging (709/206)
International Classification: G06F 15/16 (20060101);