Patents by Inventor Andrew J. Beaumont-Smith

Andrew J. Beaumont-Smith has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240095037
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Application
    Filed: July 28, 2023
    Publication date: March 21, 2024
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Publication number: 20240045680
    Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors.
    Type: Application
    Filed: August 21, 2023
    Publication date: February 8, 2024
    Inventors: Ran Aharon Chachick, Aditya Kesiraju, Andrew J. Beaumont-Smith, Jong-Suk Lee
  • Publication number: 20230418724
    Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.
    Type: Application
    Filed: June 29, 2023
    Publication date: December 28, 2023
    Inventors: Andrew J. Beaumont-Smith, Sandeep Gupta, Krishna C. Potnuru, Matthias Knoth
  • Patent number: 11775301
    Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors.
    Type: Grant
    Filed: December 13, 2021
    Date of Patent: October 3, 2023
    Assignee: Apple Inc.
    Inventors: Ran Aharon Chachick, Aditya Kesiraju, Andrew J. Beaumont-Smith, Jong-Suk Lee
  • Patent number: 11768690
    Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: September 26, 2023
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Brian P. Lilly, James Vash, Jason M. Kassoff, Krishna C. Potnuru, Rajdeep L. Bhuyar, Ran A. Chachick, Tyler J. Huberty, Derek R. Kumar
  • Patent number: 11755333
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: September 12, 2023
    Assignee: Apple Inc.
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Patent number: 11740993
    Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.
    Type: Grant
    Filed: November 30, 2021
    Date of Patent: August 29, 2023
    Assignee: Apple Inc.
    Inventors: Andrew J. Beaumont-Smith, Sandeep Gupta, Krishna C. Potnuru, Matthias Knoth
  • Patent number: 11650825
    Abstract: An instruction set architecture including instructions for a processor and instructions for a coprocessor may include synchronizing instructions that may be used to begin and end instruction sequences that include coprocessor instructions (coprocessor sequences). If a terminating synchronizing instruction is followed by an initial synchronizing instruction and the pair are detected in the coprocessor concurrently, the coprocessor may suppress execution of the pair of instructions.
    Type: Grant
    Filed: February 10, 2022
    Date of Patent: May 16, 2023
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Rajdeep L. Bhuyar, Ran A. Chachick, Andrew J. Beaumont-Smith
  • Publication number: 20230095072
    Abstract: A coprocessor with register renaming is disclosed. An apparatus includes a plurality of processors and a coprocessor respectively configured to execute processor instructions and coprocessor instructions. The coprocessor receives coprocessor instructions from ones of the processors. The coprocessor includes an array of processing elements and a result register set comprising storage elements respectively distributed within the array of processing elements. For a given member of the array of processing elements, a corresponding storage element is configured to store coprocessor instruction results generated by the given member. The result register set implements a plurality of contexts to store respective coprocessor states corresponding to coprocessor instructions received from different processors.
    Type: Application
    Filed: December 13, 2021
    Publication date: March 30, 2023
    Inventors: Ran Aharon Chachick, Aditya Kesiraju, Andrew J. Beaumont-Smith, Jong-Suk Lee
  • Publication number: 20230092898
    Abstract: A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
    Type: Application
    Filed: December 10, 2021
    Publication date: March 23, 2023
    Inventors: Brandon H. Dwiel, Andrew J. Beaumont-Smith, Eric J. Furbish, John D. Pape, Stephen G. Meier, Tyler J. Huberty
  • Publication number: 20230061419
    Abstract: An apparatus includes a plurality of processor circuits, a cache memory circuit, and a trace control circuit. The trace control circuit may be configured, in response to activation of a mode to record information indicative of program execution of at least one processor circuit of the plurality of processor circuits, to monitor memory requests transmitted between ones of the plurality of processor circuits and the cache memory circuit, and then to select a particular memory request of monitored memory requests using an arbitration algorithm. The trace control circuit may be further configured to allocate space in a trace buffer to the particular memory request, and to store, in the trace buffer, information associated with the particular memory request.
    Type: Application
    Filed: November 30, 2021
    Publication date: March 2, 2023
    Inventors: Andrew J. Beaumont-Smith, Sandeep Gupta, Krishna C. Potnuru, Matthias Knoth
  • Publication number: 20220358082
    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.
    Type: Application
    Filed: July 20, 2022
    Publication date: November 10, 2022
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick
  • Publication number: 20220350776
    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.
    Type: Application
    Filed: July 20, 2022
    Publication date: November 3, 2022
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Ran A. Chachick
  • Patent number: 11429555
    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.
    Type: Grant
    Filed: February 26, 2019
    Date of Patent: August 30, 2022
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Srikanth Balasubramanian
  • Publication number: 20220214887
    Abstract: An instruction set architecture including instructions for a processor and instructions for a coprocessor may include synchronizing instructions that may be used to begin and end instruction sequences that include coprocessor instructions (coprocessor sequences). If a terminating synchronizing instruction is followed by an initial synchronizing instruction and the pair are detected in the coprocessor concurrently, the coprocessor may suppress execution of the pair of instructions.
    Type: Application
    Filed: February 10, 2022
    Publication date: July 7, 2022
    Inventors: Aditya Kesiraju, Rajdeep L. Bhuyar, Ran A. Chachick, Andrew J. Beaumont-Smith
  • Publication number: 20220083343
    Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
    Type: Application
    Filed: November 22, 2021
    Publication date: March 17, 2022
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Brian P. Lilly, James Vash, Jason M. Kassoff, Krishna C. Potnuru, Rajdeep L. Bhuyar, Ran A. Chachick, Tyler J. Huberty, Derek R. Kumar
  • Patent number: 11249766
    Abstract: An instruction set architecture including instructions for a processor and instructions for a coprocessor may include synchronizing instructions that may be used to begin and end instruction sequences that include coprocessor instructions (coprocessor sequences). If a terminating synchronizing instruction is followed by an initial synchronizing instruction and the pair are detected in the coprocessor concurrently, the coprocessor may suppress execution of the pair of instructions.
    Type: Grant
    Filed: October 22, 2020
    Date of Patent: February 15, 2022
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Rajdeep L. Bhuyar, Ran A. Chachick, Andrew J. Beaumont-Smith
  • Patent number: 11210104
    Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: December 28, 2021
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Brian P. Lilly, James Vash, Jason M. Kassoff, Krishna C. Potnuru, Rajdeep L. Bhuyar, Ran A. Chachick, Tyler J. Huberty, Derek R. Kumar
  • Patent number: 10846091
    Abstract: In an embodiment, a coprocessor includes multiple processing elements arranged in a grid of one or more rows and one or more columns. A given processing element includes an arithmetic/logic unit (ALU) circuit configured to perform an ALU operation specified by an instruction executable by the coprocessor, wherein the ALU circuit is configured to produce a result. The given processing element further comprises a first memory coupled to the execute circuit. The first memory is configured to store results generated by the given processing element. The first memory includes a portion of a result memory implemented by the coprocessor, wherein locations in the result memory are specifiable as destination operands of instructions executable by the coprocessor. The portion of the result memory implemented by the first memory is the portion of the result memory that the given processing element is capable of updating.
    Type: Grant
    Filed: February 26, 2019
    Date of Patent: November 24, 2020
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Deepankar Duggal, Ran A. Chachick
  • Patent number: 10831488
    Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: November 10, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III, Andrew J. Beaumont-Smith