Patents by Inventor Deepak Limaye

Deepak Limaye has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11893413
    Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.
    Type: Grant
    Filed: January 6, 2021
    Date of Patent: February 6, 2024
    Assignee: Apple Inc.
    Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
  • Patent number: 11347514
    Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: May 31, 2022
    Assignee: Apple Inc.
    Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
  • Publication number: 20220083369
    Abstract: An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.
    Type: Application
    Filed: January 6, 2021
    Publication date: March 17, 2022
    Inventors: Michael D. Snyder, Ronald P. Hall, Deepak Limaye, Brett S. Feero, Rohit K. Gupta
  • Patent number: 11099990
    Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.
    Type: Grant
    Filed: August 20, 2019
    Date of Patent: August 24, 2021
    Assignee: Apple Inc.
    Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
  • Publication number: 20210056024
    Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.
    Type: Application
    Filed: August 20, 2019
    Publication date: February 25, 2021
    Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
  • Publication number: 20200264888
    Abstract: Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.
    Type: Application
    Filed: February 15, 2019
    Publication date: August 20, 2020
    Inventors: Deepak Limaye, Brian R. Mestan, Gideon N. Levinsky
  • Publication number: 20190220417
    Abstract: In an embodiment, a processor may include a register file including one or more sets of registers for one or more data types specified by the ISA implemented by the processor. The processor may have a processor mode in which the context is reduced, as compared to the full context. For example, for at least one of the data types, the registers included in the reduced context exclude one or more of the registers defined in the ISA for that data type. In an embodiment, one half or more of the registers for the data type may be excluded. When the processor is operating in a reduced context mode, the processor may detect instructions that use excluded registers, and may signal an exception for such instructions to prevent use of the excluded registers.
    Type: Application
    Filed: January 18, 2018
    Publication date: July 18, 2019
    Inventors: David J. Williamson, Deepak Limaye, James N. Hardage
  • Patent number: 9767026
    Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: September 19, 2017
    Assignee: Intel Corporation
    Inventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
  • Patent number: 9535860
    Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.
    Type: Grant
    Filed: January 17, 2013
    Date of Patent: January 3, 2017
    Assignee: Intel Corporation
    Inventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
  • Publication number: 20160092112
    Abstract: An apparatus includes a controller and a compression unit. The controller includes logic to receive an input line of data from a data producer and divide the input line of data into a plurality of segment. Each segment corresponds to a compression context and to a multi-line data tile. The controller also includes logic to write a first segment of the input line to a first multi-line data tile, and to write a second segment of the input line to a second multi-line data tile upon reaching a boundary of the first multi-line data tile. The compression unit includes logic to apply a first compression context to the first multi-line data tile and a second compression context to the second multi-line data tile.
    Type: Application
    Filed: September 25, 2014
    Publication date: March 31, 2016
    Inventors: Hasmet Akgun, Premkishore Shivakumar, Deepak Limaye
  • Patent number: 8966230
    Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: February 24, 2015
    Assignee: Intel Corporation
    Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
  • Publication number: 20140281197
    Abstract: In one embodiment, a conflict detection logic is configured to receive a plurality of memory requests from an arbiter of a coherent fabric of a system on a chip (SoC). The conflict detection logic includes snoop filter logic to downgrade a first snooped memory request for a first address to an unsnooped memory request when an indicator associated with the first address indicates that the coherent fabric has control of the first address. Other embodiments are described and claimed.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: Jose S. Niell, Daniel F. Cutter, James D. Allen, Deepak Limaye, Shadi T. Khasawneh
  • Publication number: 20140201471
    Abstract: In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.
    Type: Application
    Filed: January 17, 2013
    Publication date: July 17, 2014
    Inventors: Daniel F. Cutter, Blaise Fanning, Ramadass Nagarajan, Jose S. Niell, Debra Bernstein, Deepak Limaye, Ioannis T. Schoinas, Ravishankar Iyer
  • Patent number: 8533721
    Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Stephen J. Robinson, Deepak Limaye
  • Patent number: 8065501
    Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.
    Type: Grant
    Filed: October 28, 2008
    Date of Patent: November 22, 2011
    Assignee: Intel Corporation
    Inventors: Deepak Limaye, James Allen
  • Publication number: 20110239218
    Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.
    Type: Application
    Filed: March 26, 2010
    Publication date: September 29, 2011
    Inventors: Stephen J. Robinson, Deepak Limaye
  • Publication number: 20110078486
    Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
  • Publication number: 20100228922
    Abstract: A method and system to provide a method and system to perform background evictions of cache memory lines. In one embodiment of the invention, when a processor of a system determines that the occupancy rate of its bus interface is between a low and a high threshold, the processor performs evictions of cache memory lines that are dirty. In another embodiment of the invention, the processor performs evictions of the dirty cache memory lines when a timer between each periodic clock interrupt of an operating system has expired. By performing background evictions of dirty cache memory lines, the number of dirty cache memory lines required to be evicted before the processor changes its state from a high power state to a low power state is reduced.
    Type: Application
    Filed: March 9, 2009
    Publication date: September 9, 2010
    Inventor: Deepak Limaye
  • Publication number: 20100106937
    Abstract: A processor is to comprise a central processing unit (CPU), an address generation unit (AGU), an index generation unit and a translation look-aside buffer (TLB). The CPU of the processor is to generate signal to retrieve instructions from a memory. The AGU is to generate a final linear address and an initial linear address after receiving at least three input source values. An index generation unit coupled to the AGU is to generate a set-index value using the bits of at least the three input source values or the bits of the initial linear address even before the bits of the initial linear address are adjusted for carry. A TLB is to generate a physical address using the final linear address and an entry indexed by the set-index value.
    Type: Application
    Filed: October 28, 2008
    Publication date: April 29, 2010
    Inventors: Deepak Limaye, James Allen
  • Patent number: 7424576
    Abstract: Parallel cachelets are provided for a level of cache in a microprocessor. The cachelets may be independently addressable. The level of cache may accept multiple load requests in a single cycle and apply each to a respective cachelet. Depending upon the content stored in each cachelet, the cachelet may generate a hit/miss response to the respective load request. Load requests that hit their cachelets may be satisfied therefrom. Load requests that miss their cachelets may be referred to another level of cache.
    Type: Grant
    Filed: June 27, 2001
    Date of Patent: September 9, 2008
    Assignee: Intel Corporation
    Inventors: Ryan N. Rakvic, John P. Shen, Deepak Limaye