Patents by Inventor Deepak Limaye
Deepak Limaye has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11893413Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.Type: GrantFiled: January 6, 2021Date of Patent: February 6, 2024Assignee: Apple Inc.Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
-
Patent number: 11347514Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.Type: GrantFiled: February 15, 2019Date of Patent: May 31, 2022Assignee: Apple Inc.Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
-
Publication number: 20220083369Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.Type: ApplicationFiled: January 6, 2021Publication date: March 17, 2022Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
-
Patent number: 11099990Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.Type: GrantFiled: August 20, 2019Date of Patent: August 24, 2021Assignee: Apple Inc.Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
-
Publication number: 20210056024Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.Type: ApplicationFiled: August 20, 2019Publication date: February 25, 2021Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
-
Publication number: 20200264888Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.Type: ApplicationFiled: February 15, 2019Publication date: August 20, 2020Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
-
Publication number: 20190220417Abstract: In an embodiment, a processor may include a register file including one or more sets of registers for one or more data types specified by the ISA implemented by the processor. The processor may have a processor mode in which the context is reduced, as compared to the full context. For example, for at least one of the data types, the registers included in the reduced context exclude one or more of the registers defined in the ISA for that data type. In an embodiment, one half or more of the registers for the data type may be excluded. When the processor is operating in a reduced context mode, the processor may detect instructions that use excluded registers, and may signal an exception for such instructions to prevent use of the excluded registers.Type: ApplicationFiled: January 18, 2018Publication date: July 18, 2019Inventors: David J. Williamson, Deepak Limaye, James N. Hardage
-
Patent number: 9767026Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.Type: GrantFiled: March 15, 2013Date of Patent: September 19, 2017Assignee: Intel CorporationInventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
-
Patent number: 9535860Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.Type: GrantFiled: January 17, 2013Date of Patent: January 3, 2017Assignee: Intel CorporationInventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
-
Publication number: 20160092112Abstract: An apparatus includes a controller and a compression unit. The controller includes logic to receive an input line of data from a data producer and divide the input line of data into a plurality of segment. Each segment corresponds to a compression context and to a multi-line data tile. The controller also includes logic to write a first segment of the input line to a first multi-line data tile, and to write a second segment of the input line to a second multi-line data tile upon reaching a boundary of the first multi-line data tile. The compression unit includes logic to apply a first compression context to the first multi-line data tile and a second compression context to the second multi-line data tile.Type: ApplicationFiled: September 25, 2014Publication date: March 31, 2016Inventors: Hasmet Akgun, Premkishore Shivakumar, Deepak Limaye
-
Patent number: 8966230Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.Type: GrantFiled: September 30, 2009Date of Patent: February 24, 2015Assignee: Intel CorporationInventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
-
Publication number: 20140281197Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
-
Publication number: 20140201471Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.Type: ApplicationFiled: January 17, 2013Publication date: July 17, 2014Inventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
-
Patent number: 8533721Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.Type: GrantFiled: March 26, 2010Date of Patent: September 10, 2013Assignee: Intel CorporationInventors: Stephen J. Robinson, Deepak Limaye
-
Patent number: 8065501Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.Type: GrantFiled: October 28, 2008Date of Patent: November 22, 2011Assignee: Intel CorporationInventors: Deepak Limaye, James Allen
-
Publication number: 20110239218Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.Type: ApplicationFiled: March 26, 2010Publication date: September 29, 2011Inventors: Stephen J. Robinson, Deepak Limaye
-
Publication number: 20110078486Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.Type: ApplicationFiled: September 30, 2009Publication date: March 31, 2011Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
-
Publication number: 20100228922Abstract: A method and system to provide a method and system to perform background evictions of cache memory lines. In one embodiment of the invention, when a processor of a system determines that the occupancy rate of its bus interface is between a low and a high threshold, the processor performs evictions of cache memory lines that are dirty. In another embodiment of the invention, the processor performs evictions of the dirty cache memory lines when a timer between each periodic clock interrupt of an operating system has expired. By performing background evictions of dirty cache memory lines, the number of dirty cache memory lines required to be evicted before the processor changes its state from a high power state to a low power state is reduced.Type: ApplicationFiled: March 9, 2009Publication date: September 9, 2010Inventor: Deepak Limaye
-
Publication number: 20100106937Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.Type: ApplicationFiled: October 28, 2008Publication date: April 29, 2010Inventors: Deepak Limaye, James Allen
-
Patent number: 7424576Abstract: Parallel cachelets are provided for a level of cache in a microprocessor. The cachelets may be independently addressable. The level of cache may accept multiple load requests in a single cycle and apply each to a respective cachelet. Depending upon the content stored in each cachelet, the cachelet may generate a hit/miss response to the respective load request. Load requests that hit their cachelets may be satisfied therefrom. Load requests that miss their cachelets may be referred to another level of cache.Type: GrantFiled: June 27, 2001Date of Patent: September 9, 2008Assignee: Intel CorporationInventors: Ryan N. Rakvic, John P. Shen, Deepak Limaye