Patents by Inventor Pranay Koka
Pranay Koka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9948587Abstract: A method for data deduplication during execution of an application on a plurality of computing nodes, including: generating, by a first processor in a first computing node executing the application, a first message to process application data owned by a second computing node executing the application; receiving, by a first network interface (NI) of the first computing node, the first message; extracting, by the first NI, a first key from the first message; determining, by the first NI, the first key is not a duplicate; and placing, by the first NI and in response to the first key not being a duplicate, the first message on a network connecting the first computing node to the second computing node.Type: GrantFiled: August 8, 2014Date of Patent: April 17, 2018Assignee: Oracle International CorporationInventors: Herbert Dewitt Schwetman, Jr., Pranay Koka, Arslan Zulfiqar
-
Patent number: 9535842Abstract: Each computing node of a distributed computing system may implement a hardware mechanism at the network interface for message driven prefetching of application data. For example, a parallel data-intensive application that employs function shipping may distribute respective portions of a large data set to main memory on multiple computing nodes. The application may send messages to one of the computing nodes referencing data that is stored locally on the node. For each received message, the network interface on the recipient node may extract the reference, initiate the prefetching of referenced data into a local cache (e.g., an LLC), and then store the message for subsequent interpretation and processing by a local processor core. When the processor core retrieves a stored message for processing, the referenced data may already be in the LLC, avoiding a CPU stall while retrieving it from memory. The hardware mechanism may be configured via software.Type: GrantFiled: August 28, 2014Date of Patent: January 3, 2017Assignee: Oracle International CorporationInventors: Herbert D. Schwetman, Jr., Mohammad Arslan Zulfiqar, Pranay Koka
-
Patent number: 9489309Abstract: A system and method for providing a cache virtual partition to a data structure that includes receiving, at an address remapping device, a cache-check request including a memory address including bits, identifying, using a virtual partition table, the virtual partition by determining that the memory address falls within a data structure memory address range, obtaining a copy of virtual partition bits which include a portion of the bits, appending the copy of the virtual partition bits to the memory address, rewriting the virtual partition bits to obtain rewritten virtual partition bits corresponding to the virtual partition, and generating a remapped memory address by replacing the virtual partition bits with the rewritten virtual partition bits. The remapped memory address includes the copy of the virtual partition bits and rewritten virtual partition bits. The method also includes transmitting a remapped cache check request including the remapped memory address to the cache.Type: GrantFiled: October 31, 2014Date of Patent: November 8, 2016Assignee: Oracle International CorporationInventors: Pranay Koka, Herbert Dewitt Schwetman, Jr., Mohammad Zulfiqar, Jeff Diamond
-
Patent number: 9390016Abstract: The disclosed embodiments provide a system in which a processor chip accesses an off-chip cache via silicon photonic waveguides. The system includes a processor chip and a cache chip that are both coupled to a communications substrate. The cache chip comprises one or more cache banks that receive cache requests from a structure in the processor chip optically via a silicon photonic waveguide. More specifically, the silicon photonic waveguide is comprised of waveguides in the processor chip, the communications substrate, and the cache chip, and forms an optical channel that routes an optical signal directly from the structure to a cache bank in the cache chip via the communications substrate. Transmitting optical signals from the processor chip directly to cache banks on the cache chip facilitates reducing the wire latency of cache accesses and allowing each cache bank on the cache chip to be accessed with uniform latency.Type: GrantFiled: October 31, 2012Date of Patent: July 12, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, Michael O. McCracken, Herbert D. Schwetman, Jr., Ronald Ho
-
Publication number: 20160124858Abstract: A system and method for providing a cache virtual partition to a data structure that includes receiving, at an address remapping device, a cache-check request including a memory address including bits, identifying, using a virtual partition table, the virtual partition by determining that the memory address falls within a data structure memory address range, obtaining a copy of virtual partition bits which include a portion of the bits, appending the copy of the virtual partition bits to the memory address, rewriting the virtual partition bits to obtain rewritten virtual partition bits corresponding to the virtual partition, and generating a remapped memory address by replacing the virtual partition bits with the rewritten virtual partition bits. The remapped memory address includes the copy of the virtual partition bits and rewritten virtual partition bits. The method also includes transmitting a remapped cache check request including the remapped memory address to the cache.Type: ApplicationFiled: October 31, 2014Publication date: May 5, 2016Inventors: Pranay Koka, Herbert Dewitt Schwetman, JR., Mohammad Zulfiqar, Jeff Diamond
-
Publication number: 20160062894Abstract: Each computing node of a distributed computing system may implement a hardware mechanism at the network interface for message driven prefetching of application data. For example, a parallel data-intensive application that employs function shipping may distribute respective portions of a large data set to main memory on multiple computing nodes. The application may send messages to one of the computing nodes referencing data that is stored locally on the node. For each received message, the network interface on the recipient node may extract the reference, initiate the prefetching of referenced data into a local cache (e.g., an LLC), and then store the message for subsequent interpretation and processing by a local processor core. When the processor core retrieves a stored message for processing, the referenced data may already be in the LLC, avoiding a CPU stall while retrieving it from memory. The hardware mechanism may be configured via software.Type: ApplicationFiled: August 28, 2014Publication date: March 3, 2016Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Herbert D. Schwetman, JR., Mohammad Arslan Zulfiqar, Pranay Koka
-
Publication number: 20160043977Abstract: A method for data deduplication during execution of an application on a plurality of computing nodes, including: generating, by a first processor in a first computing node executing the application, a first message to process application data owned by a second computing node executing the application; receiving, by a first network interface (NI) of the first computing node, the first message; extracting, by the first NI, a first key from the first message; determining, by the first NI, the first key is not a duplicate; and placing, by the first NI and in response to the first key not being a duplicate, the first message on a network connecting the first computing node to the second computing node.Type: ApplicationFiled: August 8, 2014Publication date: February 11, 2016Inventors: Herbert Dewitt Schwetman, JR., Pranay Koka, Arslan Zulfiqar
-
Patent number: 9235529Abstract: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.Type: GrantFiled: August 2, 2012Date of Patent: January 12, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, David A. Munday, Michael O. McCracken, Herbert D. Schwetman, Jr.
-
Patent number: 9229163Abstract: In a multi-chip module (MCM), optical waveguides in a first plane convey modulated optical signals among integrated circuits (which are sometimes referred to as ‘chips’). Moreover, an optical-butterfly switch, optically coupled to the optical waveguides, dynamically allocates communication bandwidth among the integrated circuits. This optical-butterfly switch includes optical components in the first plane and a second plane, and optical couplers that couple the modulated optical signals to and from the first plane and the second plane. In this way, the MCM communicates the modulated optical signals among the integrated circuits without optical-waveguide crossings in a given plane.Type: GrantFiled: November 1, 2012Date of Patent: January 5, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Herbert D. Schwetman, Jr., Michael O. McCracken, Pranay Koka
-
Patent number: 9213649Abstract: The disclosed embodiments provide a system that performs distributed page-table lookups in a shared-memory multiprocessor system with two or more nodes, where each of these nodes includes a directory controller that manages a distinct portion of the system's address space. During operation, a first node receives a request for a page-table entry that is located at a physical address that is managed by the first node. The first node accesses its directory controller to retrieve the page-table entry, and then uses the page-table entry to calculate the physical address for a subsequent page-table entry. The first node determines the home node (e.g., the managing node) for this calculated physical address, and sends a request for the subsequent page-table entry to that home node.Type: GrantFiled: September 24, 2012Date of Patent: December 15, 2015Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, David A. Munday, Michael O. McCracken, Herbert D. Schwetman, Jr.
-
Publication number: 20150301949Abstract: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.Type: ApplicationFiled: August 2, 2012Publication date: October 22, 2015Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, David A. Munday, Michael O. McCracken, Herbert D. Schwetman, JR.
-
Patent number: 9081706Abstract: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.Type: GrantFiled: May 10, 2012Date of Patent: July 14, 2015Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, Michael O. McCracken, Herbert D. Schwetman, Jr., David A. Munday
-
Patent number: 9020347Abstract: A network is described in which a base optical point-to-point (P2P) network can be reconfigured to a target network topology. This reconfigurable architecture customizes the network topology for different classes of applications to maximize throughput. In particular, the network can function efficiently at high-radix and low-radix traffic patterns. This capability is obtained using configurable electrical circuit switches at each node in the network. These configurable electrical circuit switches can be set so that incoming packets are directly routed to a specified output (either a local destination or an outgoing optical link) without: delay, contention, or buffers. In this way, predefined network topologies can be configured with improved node-to-node bandwidths when compared to the original P2P network by leveraging unused optical links. Furthermore, because the electrical circuit switches can be reconfigured, the network topology can be dynamically reconfigured to suit applications or application phases.Type: GrantFiled: September 9, 2013Date of Patent: April 28, 2015Assignee: Oracle International CorporationInventors: Pranay Koka, Herbert D. Schwetman, Jr.
-
Patent number: 9009446Abstract: The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.Type: GrantFiled: August 2, 2012Date of Patent: April 14, 2015Assignee: Oracle International CorporationInventors: Pranay Koka, David A. Munday, Michael O. McCracken, Herbert D. Schwetman, Jr.
-
Patent number: 9003163Abstract: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.Type: GrantFiled: June 12, 2012Date of Patent: April 7, 2015Assignee: Oracle International CorporationInventors: Pranay Koka, Michael O. McCracken, Herbert D. Schwetman, Jr., David A. Munday, Jose Renau Ardevol
-
Publication number: 20150071632Abstract: A network is described in which a base optical point-to-point (P2P) network can be reconfigured to a target network topology. This reconfigurable architecture customizes the network topology for different classes of applications to maximize throughput. In particular, the network can function efficiently at high-radix and low-radix traffic patterns. This capability is obtained using configurable electrical circuit switches at each node in the network. These configurable electrical circuit switches can be set so that incoming packets are directly routed to a specified output (either a local destination or an outgoing optical link) without: delay, contention, or buffers. In this way, predefined network topologies can be configured with improved node-to-node bandwidths when compared to the original P2P network by leveraging unused optical links. Furthermore, because the electrical circuit switches can be reconfigured, the network topology can be dynamically reconfigured to suit applications or application phases.Type: ApplicationFiled: September 9, 2013Publication date: March 12, 2015Applicant: Oracle International CorporationInventors: Pranay Koka, Herbert D. Schwetman, JR.
-
Patent number: 8976802Abstract: An arbitration technique for determining mappings for a switch is described. During a given arbitration decision cycle, an arbitration mechanism maintains, until expiration, a set of mappings from a subset of the input ports to a subset of the output ports of the switch. This set of mappings was determined during an arbitration decision cycle up to K cycles preceding the given arbitration decision cycle. Because the set of mappings are maintained, it is easier for the arbitration mechanism to determine mappings from a remainder of the input ports to the remainder of the output ports without collisions.Type: GrantFiled: March 15, 2013Date of Patent: March 10, 2015Assignee: Oracle International CorporationInventors: Pranay Koka, Herbert D. Schwetman, Jr., Syed Ali Raza Jafri
-
Patent number: 8909051Abstract: In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides that convey optical signals. The optical waveguides provide dedicated point-to-point optical links between all pairs of the integrated circuits. Moreover, for a given point-to-point optical link between a given pair of integrated circuits, other integrated circuits in the integrated circuits steal access on the given point-to-point optical link when communicating information to one of the given pair of integrated circuits so that the given point-to-point optical link is shared by more than the given pair of integrated circuits. Furthermore, the integrated circuits recover errors in messages in the optical signals corrupted by collisions on the given point-to-point optical link using erasure coding. In this way, the MCM may provide an optical network with increased bandwidth relative to a point-to-point optical network.Type: GrantFiled: October 9, 2012Date of Patent: December 9, 2014Assignee: Oracle International CorporationInventors: Arslan Zulfiqar, Pranay Koka, Herbert D. Schwetman, Jr.
-
Publication number: 20140269751Abstract: An arbitration technique for determining mappings for a switch is described. During a given arbitration decision cycle, an arbitration mechanism maintains, until expiration, a set of mappings from a subset of the input ports to a subset of the output ports of the switch. This set of mappings was determined during an arbitration decision cycle up to K cycles preceding the given arbitration decision cycle. Because the set of mappings are maintained, it is easier for the arbitration mechanism to determine mappings from a remainder of the input ports to the remainder of the output ports without collisions.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Pranay Koka, Herbert D. Schwetman, JR., Syed Ali Raza Jafri
-
Patent number: 8824496Abstract: A method for arbitration in an arbitration domain. The method includes: receiving, by each node of a plurality of nodes in the arbitration domain, an arbitration request from each sending node of the plurality of nodes in the arbitration domain, where the plurality of nodes in the arbitration domain each use a shared data channel to send data to a set of receiving nodes; assigning, by each node in the arbitration domain, consecutive time slots to each sending node based on a plurality of priorities assigned to the plurality of nodes in the arbitration domain; for each time slot: sending, from the arbitration domain, a switch request to a receiving node designated by the sending node, where the receiving node is in the set of receiving nodes; and sending, by the sending node, data to the receiving node via the shared data channel during the time slot.Type: GrantFiled: October 30, 2009Date of Patent: September 2, 2014Assignee: Oracle America, Inc.Inventors: Pranay Koka, Michael Oliver McCracken, Herbert Dewitt Schwetman, Jr., Xuezhe Zheng, Ashok Krishnamoorthy