Patents by Inventor Deepak Limaye

Deepak Limaye has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Virtual channel support using write table

Patent number: 11893413

Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.

Type: Grant

Filed: January 6, 2021

Date of Patent: February 6, 2024

Assignee: Apple Inc.

Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
Content-addressable memory filtering based on microarchitectural state

Patent number: 11347514

Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.

Type: Grant

Filed: February 15, 2019

Date of Patent: May 31, 2022

Assignee: Apple Inc.

Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
Virtual Channel Support Using Write Table

Publication number: 20220083369

Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.

Type: Application

Filed: January 6, 2021

Publication date: March 17, 2022

Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
Managing serial miss requests for load operations in a non-coherent memory system

Patent number: 11099990

Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.

Type: Grant

Filed: August 20, 2019

Date of Patent: August 24, 2021

Assignee: Apple Inc.

Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
MANAGING SERIAL MISS REQUESTS FOR LOAD OPERATIONS IN A NON-COHERENT MEMORY SYSTEM

Publication number: 20210056024

Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.

Type: Application

Filed: August 20, 2019

Publication date: February 25, 2021

Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
Content-Addressable Memory Filtering based on Microarchitectural State

Publication number: 20200264888

Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.

Type: Application

Filed: February 15, 2019

Publication date: August 20, 2020

Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
Context Switch Optimization

Publication number: 20190220417

Abstract: In an embodiment, a processor may include a register file including one or more sets of registers for one or more data types specified by the ISA implemented by the processor. The processor may have a processor mode in which the context is reduced, as compared to the full context. For example, for at least one of the data types, the registers included in the reduced context exclude one or more of the registers defined in the ISA for that data type. In an embodiment, one half or more of the registers for the data type may be excluded. When the processor is operating in a reduced context mode, the processor may detect instructions that use excluded registers, and may signal an exception for such instructions to prevent use of the excluded registers.

Type: Application

Filed: January 18, 2018

Publication date: July 18, 2019

Inventors: David J. Williamson, Deepak Limaye, James N. Hardage
Providing snoop filtering associated with a data buffer

Patent number: 9767026

Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.

Type: Grant

Filed: March 15, 2013

Date of Patent: September 19, 2017

Assignee: Intel Corporation

Inventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
Arbitrating memory accesses via a shared memory fabric

Patent number: 9535860

Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.

Type: Grant

Filed: January 17, 2013

Date of Patent: January 3, 2017

Assignee: Intel Corporation

Inventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
INSTRUCTION AND LOGIC FOR HARDWARE COMPRESSION WITH TILED DATA STRUCTURES OF DISSIMILAR DIMENSIONS

Publication number: 20160092112

Abstract: An apparatus includes a controller and a compression unit. The controller includes logic to receive an input line of data from a data producer and divide the input line of data into a plurality of segment. Each segment corresponds to a compression context and to a multi-line data tile. The controller also includes logic to write a first segment of the input line to a first multi-line data tile, and to write a second segment of the input line to a second multi-line data tile upon reaching a boundary of the first multi-line data tile. The compression unit includes logic to apply a first compression context to the first multi-line data tile and a second compression context to the second multi-line data tile.

Type: Application

Filed: September 25, 2014

Publication date: March 31, 2016

Inventors: Hasmet Akgun, Premkishore Shivakumar, Deepak Limaye
Dynamic selection of execution stage

Patent number: 8966230

Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.

Type: Grant

Filed: September 30, 2009

Date of Patent: February 24, 2015

Assignee: Intel Corporation

Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
PROVIDING SNOOP FILTERING ASSOCIATED WITH A DATA BUFFER

Publication number: 20140281197

Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
Arbitrating Memory Accesses Via A Shared Memory Fabric

Publication number: 20140201471

Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.

Type: Application

Filed: January 17, 2013

Publication date: July 17, 2014

Inventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
Method and system of scheduling out-of-order operations without the requirement to execute compare, ready and pick logic in a single cycle

Patent number: 8533721

Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.

Type: Grant

Filed: March 26, 2010

Date of Patent: September 10, 2013

Assignee: Intel Corporation

Inventors: Stephen J. Robinson, Deepak Limaye
Indexing a translation lookaside buffer (TLB)

Patent number: 8065501

Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.

Type: Grant

Filed: October 28, 2008

Date of Patent: November 22, 2011

Assignee: Intel Corporation

Inventors: Deepak Limaye, James Allen
METHOD AND SYSTEM OF LAZY OUT-OF-ORDER SCHEDULING

Publication number: 20110239218

Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.

Type: Application

Filed: March 26, 2010

Publication date: September 29, 2011

Inventors: Stephen J. Robinson, Deepak Limaye
DYNAMIC SELECTION OF EXECUTION STAGE

Publication number: 20110078486

Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.

Type: Application

Filed: September 30, 2009

Publication date: March 31, 2011

Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
METHOD AND SYSTEM TO PERFORM BACKGROUND EVICTIONS OF CACHE MEMORY LINES

Publication number: 20100228922

Abstract: A method and system to provide a method and system to perform background evictions of cache memory lines. In one embodiment of the invention, when a processor of a system determines that the occupancy rate of its bus interface is between a low and a high threshold, the processor performs evictions of cache memory lines that are dirty. In another embodiment of the invention, the processor performs evictions of the dirty cache memory lines when a timer between each periodic clock interrupt of an operating system has expired. By performing background evictions of dirty cache memory lines, the number of dirty cache memory lines required to be evicted before the processor changes its state from a high power state to a low power state is reduced.

Type: Application

Filed: March 9, 2009

Publication date: September 9, 2010

Inventor: Deepak Limaye
INDEXING A TRANSLATION LOOKASIDE BUFFER (TLB)

Publication number: 20100106937

Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.

Type: Application

Filed: October 28, 2008

Publication date: April 29, 2010

Inventors: Deepak Limaye, James Allen
Parallel cachelets

Patent number: 7424576

Abstract: Parallel cachelets are provided for a level of cache in a microprocessor. The cachelets may be independently addressable. The level of cache may accept multiple load requests in a single cycle and apply each to a respective cachelet. Depending upon the content stored in each cachelet, the cachelet may generate a hit/miss response to the respective load request. Load requests that hit their cachelets may be satisfied therefrom. Load requests that miss their cachelets may be referred to another level of cache.

Type: Grant

Filed: June 27, 2001

Date of Patent: September 9, 2008

Assignee: Intel Corporation

Inventors: Ryan N. Rakvic, John P. Shen, Deepak Limaye

1 2 next