Patents by Inventor Jay Fleischman
Jay Fleischman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12333309Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.Type: GrantFiled: June 16, 2023Date of Patent: June 17, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
-
Patent number: 12299445Abstract: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.Type: GrantFiled: June 6, 2022Date of Patent: May 13, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Gabriel H. Loh, Yasuko Eckert, Bradford Beckmann, Michael Estlick, Jay Fleischman
-
Patent number: 12282428Abstract: In response to generating one or more speculative prefetch requests for a last-level cache, a processor determines prefetch analytics for the generated speculative prefetch requests and compares the determined prefetch analytics of the speculative prefetch requests to selection thresholds. In response to a speculative prefetch request meeting or exceeding a selection threshold, the processor selects the speculative prefetch request for issuance to a memory-side cache controller. When one or more system conditions meet one or more condition thresholds, the processor issues the selected speculative prefetch request to the memory-side cache controller. The memory-side cache controller then retrieves the data indicated in the selected speculative prefetch request from a memory and stores it in a memory-side cache in the data fabric coupled to the last-level cache.Type: GrantFiled: December 28, 2021Date of Patent: April 22, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Tarun Nakra, Akhil Arunkumar, Paul Moyer, Jay Fleischman
-
Patent number: 12204454Abstract: Systems, apparatuses, and methods for employing system probe filter aware last level cache insertion bypassing policies are disclosed. A system includes a plurality of processing nodes, a probe filter, and a shared cache. The probe filter monitors a rate of recall probes that are generated, and if the rate is greater than a first threshold, then the system initiates a cache partitioning and monitoring phase for the shared cache. Accordingly, the cache is partitioned into two portions. If the hit rate of a first portion is greater than a second threshold, then a second portion will have a non-bypass insertion policy since the cache is relatively useful in this scenario. However, if the hit rate of the first portion is less than or equal to the second threshold, then the second portion will have a bypass insertion policy since the cache is less useful in this case.Type: GrantFiled: October 29, 2021Date of Patent: January 21, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Paul James Moyer, Jay Fleischman
-
Patent number: 12153926Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: GrantFiled: December 21, 2023Date of Patent: November 26, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
-
Publication number: 20240320050Abstract: A processor includes two or more core dies each including one or more processor cores. A first core die of the processor is associated with a first operating system and the processor cores of the first core die execute a set of instructions according to the first operating system to produce a first result. A second core of the processor is associated with a second operating system and the processor cores of the second core of the second core die execute the set of instructions according to the second operating system to produce a second result. The first and second core dies provide the first and second results to a voting circuitry that generates an output based on the first and second results.Type: ApplicationFiled: March 24, 2023Publication date: September 26, 2024Inventors: Patrick Pok Man Lai, Vilas Sridharan, Jay Fleischman
-
Publication number: 20240319964Abstract: A processor includes one or more processor cores configured to perform accumulate top (ACCT) and accumulate bottom (ACCB) instructions. To perform such instructions, at least one processor core of the processor includes an ACCT data path that adds a first portion of a block of data to a first lane of a set of lanes of a top accumulator and adds a carry-out bit to a second lane of the set of lanes of the top accumulator. Further, the at least one processor core includes an ACCB data path that adds a second portion of the block of data to a first lane of a set of lanes of a bottom accumulator and adds a carry-out bit to a second lane of the set of lanes of the bottom accumulator.Type: ApplicationFiled: March 24, 2023Publication date: September 26, 2024Inventors: Onur Kayiran, Lee Evan Eisen, Michael Estlick, Jay Fleischman, Matthew R. Poremba, Gabriel H. Loh
-
Publication number: 20240126552Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: ApplicationFiled: December 21, 2023Publication date: April 18, 2024Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
-
Publication number: 20240095180Abstract: The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.Type: ApplicationFiled: December 23, 2022Publication date: March 21, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Gabriel H. Loh, Michael Estlick, Jay Fleischman, Michael J. Schulte, Bradford Beckmann, Yasuko Eckert
-
Publication number: 20240045694Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.Type: ApplicationFiled: June 16, 2023Publication date: February 8, 2024Inventors: Jay FLEISCHMAN, Michael ESTLICK, Michael Christopher SEDMAK, Erik SWANSON, Sneha V. DESAI
-
Patent number: 11868777Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: GrantFiled: December 16, 2020Date of Patent: January 9, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
-
Patent number: 11847062Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.Type: GrantFiled: December 16, 2021Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
-
Publication number: 20230393855Abstract: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.Type: ApplicationFiled: June 6, 2022Publication date: December 7, 2023Inventors: Gabriel H. Loh, Yasuko Eckert, Bradford Beckmann, Michael Estlick, Jay Fleischman
-
Patent number: 11709681Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.Type: GrantFiled: December 11, 2017Date of Patent: July 25, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
-
Publication number: 20230205700Abstract: In response to generating one or more speculative prefetch requests for a last-level cache, a processor determines prefetch analytics for the generated speculative prefetch requests and compares the determined prefetch analytics of the speculative prefetch requests to selection thresholds. In response to a speculative prefetch request meeting or exceeding a selection threshold, the processor selects the speculative prefetch request for issuance to a memory-side cache controller. When one or more system conditions meet one or more condition thresholds, the processor issues the selected speculative prefetch request to the memory-side cache controller. The memory-side cache controller then retrieves the data indicated in the selected speculative prefetch request from a memory and stores it in a memory-side cache in the data fabric coupled to the last-level cache.Type: ApplicationFiled: December 28, 2021Publication date: June 29, 2023Inventors: Tarun NAKRA, Akhil ARUNKUMAR, Paul MOYER, Jay FLEISCHMAN
-
Publication number: 20230195643Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.Type: ApplicationFiled: December 16, 2021Publication date: June 22, 2023Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
-
Patent number: 11567554Abstract: A pipeline includes a first portion configured to process a first subset of bits of an instruction and a second portion configured to process a second subset of the bits of the instruction. A first clock mesh is configured to provide a first clock signal to the first portion of the pipeline. A second clock mesh is configured to provide a second clock signal to the second portion of the pipeline. The first and second clock meshes selectively provide the first and second clock signals based on characteristics of in-flight instructions that have been dispatched to the pipeline but not yet retired. In some cases, a physical register file is configured to store values of bits representative of instructions. Only the first subset is stored in the physical register file in response to the value of the zero high bit indicating that the second subset is equal to zero.Type: GrantFiled: December 11, 2017Date of Patent: January 31, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jay Fleischman, Michael Estlick, Michael Christopher Sedmak, Erik Swanson, Sneha V. Desai
-
Patent number: 11550728Abstract: A processing system includes a processor, a memory, and an operating system that are used to allocate a page table caching memory object (PTCM) for a user of the processing system. An allocation of the PTCM is requested from a PTCM allocation system. In order to allocate the PTCM, a plurality of physical memory pages from a memory are allocated to store a PTCM page table that is associated with the PTCM. A lockable region of a cache is designated to hold a copy of the PTCM page table, after which the lockable region of the cache is subsequently locked. The PTCM page table is populated with page table entries associated with the PTCM and copied to the locked region of the cache.Type: GrantFiled: September 27, 2019Date of Patent: January 10, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Derrick Allen Aguren, Eric H. Van Tassell, Gabriel H. Loh, Jay Fleischman
-
Publication number: 20220188117Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Inventors: JOHN KALAMATIANOS, MICHAEL T. CLARK, MARIUS EVERS, WILLIAM L. WALKER, PAUL MOYER, JAY FLEISCHMAN, JAGADISH B. KOTRA
-
Publication number: 20220050785Abstract: Systems, apparatuses, and methods for employing system probe filter aware last level cache insertion bypassing policies are disclosed. A system includes a plurality of processing nodes, a probe filter, and a shared cache. The probe filter monitors a rate of recall probes that are generated, and if the rate is greater than a first threshold, then the system initiates a cache partitioning and monitoring phase for the shared cache. Accordingly, the cache is partitioned into two portions. If the hit rate of a first portion is greater than a second threshold, then a second portion will have a non-bypass insertion policy since the cache is relatively useful in this scenario. However, if the hit rate of the first portion is less than or equal to the second threshold, then the second portion will have a bypass insertion policy since the cache is less useful in this case.Type: ApplicationFiled: October 29, 2021Publication date: February 17, 2022Inventors: Paul James Moyer, Jay Fleischman