SYSTEM AND METHOD FOR SUPPORTING GUARANTEED MULTI-POINT DELIVERY IN A DISTRIBUTED DATA GRID

Info

Publication number: 20140108532
Type: Application
Filed: Nov 7, 2012
Publication Date: Apr 17, 2014
Applicant: ORACLE INTERNATIONAL CORPORATION (Redwood Shores, CA)
Inventors: Robert H. Lee (San Carlos, CA), Gene Gleyzer (Lexington, MA)
Application Number: 13/671,369

Abstract

A system and method can support guaranteed multi-point message delivery in a distributed data grid. A messaging facility in the distributed data grid can receive an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid. The messaging facility can deliver the incoming message to the plurality of nodes according to an order in a list. Furthermore, a node in the plurality of nodes operates to skip a next node in the list to deliver the incoming message, when the next node is dead or unavailable.

Description

Description

CLAIM OF PRIORITY

This application claims priority on U.S. Provisional Patent Application No. 61/714,100, entitled “SYSTEM AND METHOD FOR SUPPORTING A DISTRIBUTED DATA GRID IN A MIDDLEWARE ENVIRONMENT,” by inventors Robert H. Lee, Gene Gleyzer, Charlie Helin, Mark Falco, Ballav Bihani and Jason Howes, filed Oct. 15, 2012, which application is herein incorporated by reference.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

CROSS-REFERENCED APPLICATIONS

The current application hereby incorporates by reference the material in the following patent applications:

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATA GRID,” by inventors Robert H. Lee and Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05359US0).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR PROVIDING TRANSIENT PARTITION CONSISTENCY IN A DISTRIBUTED DATA GRID,” by inventors Robert H. Lee and Gene Gleyzer, filed ______(Attorney Docket No.: ORACL-05359US1).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR SUPPORTING ASYNCHRONOUS MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID,” by inventor Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05360US0).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR SUPPORTING OUT-OF-ORDER MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID,” by inventors Mark Falco and Gene Gleyzer, filed ______ (Attorney Docket No.: ORACL-05364US0).

1. Field of Invention

The present invention is generally related to computer systems, and is particularly related to a distributed data grid.

2. Background

Modern computing systems, particularly those employed by larger organizations and enterprises, continue to increase in size and complexity. Particularly, in areas such as Internet applications, there is an expectation that millions of users should be able to simultaneously access that application, which effectively leads to an exponential increase in the amount of content generated and consumed by users, and transactions involving that content. Such activity also results in a corresponding increase in the number of transaction calls to databases and metadata stores, which have a limited capacity to accommodate that demand.

This is the general area that embodiments of the invention are intended to address.

SUMMARY

Described herein are systems and methods that can support guaranteed multi-point message delivery in a distributed data grid. A messaging facility in the distributed data grid can receive an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid. The messaging facility can deliver the incoming message to the plurality of nodes according to an order in a list. Furthermore, a node in the plurality of nodes operates to skip a next node in the list to deliver the incoming message, when the next node is dead or unavailable.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention.

FIG. 2 is an illustration of supporting guaranteed multi-point message delivery in a distributed data grid in accordance with various embodiments of the invention.

FIG. 3 is an illustration of creating a chain request message in a distributed data grid in accordance with various embodiments of the invention.

FIG. 4 is an illustration of handling a chain request message in a distributed data grid in accordance with various embodiments of the invention.

FIG. 5 is an illustration of supporting consistent message delivery when a primary owner node of a partition becomes unavailable in a distributed data grid in accordance with various embodiments of the invention.

FIG. 6 illustrates an exemplary flow chart for supporting guaranteed multi-point message delivery in a distributed data grid in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Described herein are systems and methods that can support guaranteed multi-point message delivery in a distributed data grid.

In accordance with an embodiment, as referred to herein a “distributed data grid”, “data grid cluster”, or “data grid”, is a system comprising a plurality of computer servers which work together to manage information and related operations, such as computations, within a distributed or clustered environment. The data grid cluster can be used to manage application objects and data that are shared across the servers. Preferably, a data grid cluster should have low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, data grid clusters are well suited for use in computational intensive, stateful middle-tier applications. Some examples of data grid clusters, e.g., the Oracle Coherence data grid cluster, can store the information in-memory to achieve higher performance, and can employ redundancy in keeping copies of that information synchronized across multiple servers, thus ensuring resiliency of the system and the availability of the data in the event of server failure. For example, Coherence provides replicated and distributed (partitioned) data management and caching services on top of a reliable, highly scalable peer-to-peer clustering protocol.

An in-memory data grid can provide the data storage and management capabilities by distributing data over a number of servers working together. The data grid can be middleware that runs in the same tier as an application server or within an application server. It can provide management and processing of data and can also push the processing to where the data is located in the grid. In addition, the in-memory data grid can eliminate single points of failure by automatically and transparently failing over and redistributing its clustered data management services when a server becomes inoperative or is disconnected from the network. When a new server is added, or when a failed server is restarted, it can automatically join the cluster and services can be failed back over to it, transparently redistributing the cluster load. The data grid can also include network-level fault tolerance features and transparent soft re-start capability.

In accordance with an embodiment, the functionality of a data grid cluster is based on using different cluster services. The cluster services can include root cluster services, partitioned cache services, and proxy services. Within the data grid cluster, each cluster node can participate in a number of cluster services, both in terms of providing and consuming the cluster services. Each cluster service has a service name that uniquely identifies the service within the data grid cluster, and a service type, which defines what the cluster service can do. Other than the root cluster service running on each cluster node in the data grid cluster, there may be multiple named instances of each service type. The services can be either configured by the user, or provided by the data grid cluster as a default set of services.

FIG. 1 is an illustration of a data grid cluster in accordance with various embodiments of the invention. As shown in FIG. 1, a data grid cluster 100, e.g. an Oracle Coherence data grid, includes a plurality of cluster nodes 101-106 having various cluster services 111-116 running thereon. Additionally, a cache configuration file 110 can be used to configure the data grid cluster 100.

Guaranteed Multi-Point Message Delivery

In accordance with various embodiments of the invention, a distributed data grid can support guaranteed multi-point message delivery, which can provide consistency in delivering messages among different cluster nodes in the distributed data grid.

FIG. 2 is an illustration of supporting consistent message delivery in a distributed data grid in accordance with various embodiments of the invention. As shown in FIG. 2, a distributed data grid 200 can include a plurality of cluster nodes, such as node A-D 201-204.

A cluster node, e.g. node A 201, can be either the originator of an internal message, or a recipient of an internal message. Additionally, the cluster node A 201 can also be a recipient of an incoming message from a client 210. The cluster node A 201 can use a message facility 211 to configure and deliver a message to different cluster nodes in the distributed data grid 200, such as nodes B-D 202-204.

The cluster node A 201 can deliver the message to the recipient nodes B-D 202-204 in a particular order, e.g. based on a list from node B 202 to node C 203 then to node D 204. Here, the message facility 211 on the cluster node A 201 can be used to assign the order to the recipient nodes B-D 202-204 in the distributed data grid 200.

Furthermore, each recipient node B-D 202-204 in the list can keep track of how the message is to be delivered down the list. For example, a recipient node, e.g. node B 202, can detect that node C 203, which is the next node in the list, is dead or temporarily unavailable. Then, the node B 202 can skip the node C 203, and deliver the message directly to node D 204, which is the node next to the node C 203 on the list.

As shown in FIG. 2, the cluster node A 201 can receive an incoming message from a client 210. The incoming message can include a request from the client 210, and the client 210 may expect a response from the distributed data grid 200. Then, a response message can be sent back to the client in the reverse direction, e.g. from node D 204 to node C 203 then to node B 202 before reaching node A 201. Finally, the distributed data grid 200 can provide the response message to the client 210, after the incoming message is delivered to the recipient nodes B-D 202-204 and processed in the distributed data grid 200.

In accordance with various embodiments of the invention, the guaranteed multi-point message delivery feature can be used for managing partition backups. For example, a cluster node, e.g. the node A 201, can be the owner of a partition, while nodes B-D are backup nodes for the node A 201. Here, the partition can define that the value of a property x is equal to 1 (“x=1”), which can be maintained on both the primary owner node A 201 and each backup node B-D 202-204.

Then, the node A 201 can receive a message from the client 210, which changes the value of the property x to 2 (“x=2”). Thus, the cluster node A 201 may propagate the message, “x=2”, to each backup node B-D 202-204.

As shown in FIG. 2, the cluster node A 201 can deliver the message, “x=2”, to the recipient nodes B-D 202-204 in order. Furthermore, when the node B 202 detects that the node C 203 is dead or unavailable, the node B 202 can deliver the message, “x=2”, to the node D 204 directly, while skipping the node C 203. Thus, the distributed data grid 200 can maintain a consistent view that the value of x equals to 2.

On the other hand, an alternative approach is that the cluster node A 201 can deliver the message to each recipient node B-D 202-204 separately, or in parallel (as shown in dotted line in FIG. 2).

Unlike the guaranteed multi-point message delivery feature as described in the above, this alternative approach can be problematic in the scenario when a cluster node, e.g. node C 203 is dead or become unavailable. Since the delivery of the message, “x=2”, to the cluster node C 203 may not go through, the value of x on node C may remain to be 1 without notice.

Thus, using the alternative approach, the distributed data grid 200 may not be able to maintain a consistent view that the value of x equals to 2. This alternative approach can cause inconsistency, in terms of determining the value of x at a later time point, among the different cluster nodes A-D 201-204 in the distributed data grid 200.

Furthermore, such inconsistency may not be resolved until a full synchronization is performed in the distributed data grid 200. The full synchronization in the distributed data grid 200 can be costly, since it may require the distributed data grid 200 to stop providing services.

Additionally, the guaranteed multi-point message delivery feature can be used complimentarily with a partition versioning feature supported in the distributed data grid 200. For example, when a new node E 205 is added into the distributed data grid 200, the distributed data grid 200 can bring the state of the newly added node E 205 current, based on the partition versioning feature, so that the node E 205 can start receive new messages based on the guaranteed multi-point message delivery feature, as described above.

Additional descriptions of various embodiments of using partition versioning feature in a distributed data grid 200 are provided in U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATA GRID”, filed ______, which application is herein incorporated by reference.

Furthermore, the guaranteed multi-point message delivery feature can be used complimentarily with a poll model that is supported in the distributed data grid 200 for processing incoming messages asynchronizingly.

Additional descriptions of various embodiments of supporting asynchronized message processing in a distributed data grid 200 are provided in U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FOR SUPPORTING ASYNCHRONIZED MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID”, filed ______, which application is herein incorporated by reference.

FIG. 3 is an illustration of creating a chain request message in a distributed data grid in accordance with various embodiments of the invention. As shown in FIG. 3, a cluster node A 301 in the distributed data grid 300 can create a chain request message 320, e.g. using a messaging facility 311 on the cluster node A 301.

The chain request message 320 can be either initiated by the cluster node A 301, or be created based on an incoming message 310 received by the cluster node A 301. The chain request message 320 can include an internal data structure 321 that stores the information about a list of recipient nodes that the chain request message 320 will be delivered to.

As shown in FIG. 3, the cluster node A 301 can deliver the chain request message 320 to the different cluster nodes in the distributed data grid 300, e.g. nodes B-C 302-203, for further processing.

FIG. 4 is an illustration of handling a chain request message in a distributed data grid in accordance with various embodiments of the invention. As shown in FIG. 4, a cluster node, e.g. node B 402, in the distributed data grid 400 can receive a chain request message 420 that contains a list of recipient nodes in an internal data structure 421.

A messaging facility 412 in the cluster node B 402 can track the delivery of the chain request message 420 to the rest of the recipient nodes in the list 421. For example, when the cluster node B 402 detects that node C 403 is dead or is unavailable, the cluster node B 402 can access the internal data structure 421 for the list of recipient nodes and find out that node D 204 is the next node following node C 203. Therefore, the cluster node B 402 can deliver the chain request message 420 to node D 404 accordingly.

FIG. 5 is an illustration of supporting consistent message delivery when a primary owner node of a partition becomes unavailable in a distributed data grid in accordance with various embodiments of the invention. As shown in FIG. 5, a distributed data grid 500 can include a plurality of cluster nodes, such as node A-D 501-504.

Initially, a partition can define that the value of a property x is equal to 1 (“x=1”). The partition can be maintained on the primary owner node A 501 and each backup node B-D 502-504. Then, the node A 501 can receive a message from the client 510, which changes the value of the property x to 2 (“x=2”). Thus, the messaging facility 511 in the cluster node A 501 can propagate the message, “x=2”, to each backup node B-D 202-204.

As shown in FIG. 5, when the client 510 detects that the primary owner node A 501 is dead or unavailable, the message, “x=2”, can be either having already been delivered to the cluster node B 502, or undelivered.

When the message, “x=2”, has already been delivered to the cluster node B 502, the distributed data grid 500 can guarantee that the message is delivered to the rest of nodes C-D 503-504, and can ensure a consistent view that the value of x is equal to 2. On the other hand when the message, “x=2”, is undelivered before the primary owner node A 501 becomes unavailable, the distributed data grid 500 maintains the consistent view that the value of x is equal to 1. Thus, the distributed data grid 500 can ensure a consistent view of the value of x client 510 in either case.

Then, the distributed data grid 500 can provide a new primary owner node, e.g. cluster node E 505, which can continue maintaining the partition and handle subsequent incoming messages from the client 510.

Alternatively, the cluster node A 501 can deliver the message, “x=2”, to each recipient node B-D 502-504 separately, or in parallel (as shown in dotted line in FIG. 5). Unlike the guaranteed multi-point message delivery feature as described in the above, this alternative approach can be problematic in the scenario when the primary owner node A 501 is dead or become unavailable. Since the delivery of the message, “x=2”, to the different cluster nodes B-D 502-504 may not go through, the value of x on the cluster nodes B-D 502-504 is not guaranteed to be consistent.

FIG. 6 illustrates an exemplary flow chart for supporting guaranteed multi-point message delivery in a distributed data grid in accordance with an embodiment of the invention. As shown in FIG. 6, at step 601, a messaging facility on a cluster node in the distributed data grid can receive an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid. Then, at step 602, the messaging facility can be configured to deliver the incoming message to the plurality of nodes according to an order in a list. Furthermore, at step 603, a node in the plurality of nodes can skip a next node in the list for delivering the incoming message, when the next node is dead or unavailable.

The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.

In some embodiments, the present invention includes a computer program product which is a storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims

1. A method for supporting consistent message delivery in a distributed data grid operating on one or more microprocessors, comprising:

receiving an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid;

delivering the incoming message to the plurality of nodes according to an order in a list; and

allowing a node in the plurality of nodes to skip a next node in the list for delivering the incoming message, when the next node is dead or unavailable.

2. The method according to claim 1, further comprising:

allowing the incoming message to include a request from a client; and

providing a response to the client after the incoming message is delivered to the plurality of nodes in the distributed data grid.

3. The method according to claim 1, further comprising:

allowing the node to deliver the incoming message to a node that is next to the node that is dead or unavailable in the list.

4. The method according to claim 1, further comprising:

allowing each node in the plurality of nodes to keep track of the incoming message as the incoming message is delivered accordingly to the list.

5. The method according to claim 1, further comprising:

creating a chain request message that stores information about the list in an internal data structure based on the incoming message.

6. The method according to claim 5, further comprising:

allowing a node in the distributed data grid to obtain the information about the list from the internal data structure in the chain request message.

7. The method according to claim 1, further comprising:

allowing a first node in the list to be a primary owner of a partition and other nodes in the list of nodes to be backup nodes for the first node.

8. The method according to claim 7, further comprising:

propagating an update of the partition from the primary owner of the partition to the backup nodes.

9. The method according to claim 1, further comprising:

performing a full synchronization on the plurality of nodes in the distributed data grid.

10. The method according to claim 1, further comprising:

configuring a new node in the distributed data grid to receive the incoming message.

11. A system for supporting guaranteed multi-point delivery in a distributed data grid, comprising:

one or more microprocessors;

a messaging facility in the distributed data grid running on the one or more microprocessors, wherein the messaging facility operates to perform the steps of receiving an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid; delivering the incoming message to the plurality of nodes according to an order in a list; and allowing a node in the plurality of nodes to skip a next node in the list for delivering the incoming message, when the next node is dead or unavailable.

12. The system according to claim 11, wherein:

the incoming message includes a request from a client, and wherein a response is provided to the client after the incoming message is delivered to the plurality of nodes in the distributed data grid.

13. The system according to claim 11, wherein:

the node operates to deliver the incoming message to a node that is next to the node that is dead or unavailable in the list.

14. The system according to claim 11, wherein:

each node in the plurality of nodes operates to keep track of the incoming message as it is delivered accordingly to the list.

15. The system according to claim 11, wherein:

a first node in the plurality of nodes operates to create a chain request message that stores information about the list in an internal data structure based on the incoming message

16. The system according to claim 15, wherein:

a second node in the distributed data grid operates to obtain the information about the list from the internal data structure in the chain request message.

17. The system according to claim 11, wherein:

a first node in the list operates to be a primary owner of a partition and other nodes in the list of nodes operates to be backup nodes for the first node.

18. The system according to claim 17, wherein:

an update of the partition is propagated from the primary owner of the partition to the backup nodes.

19. The system according to claim 11, wherein:

a new node in the distributed data grid is confiured to receive the incoming message.

20. A non-transitory machine readable storage medium having instructions stored thereon that when executed cause a system to perform the steps of:

receiving an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid;

delivering the incoming message to the plurality of nodes according to an order in a list; and

allowing a node in the plurality of nodes to skip a next node in the list for delivering the incoming message, when the next node is dead or unavailable.