Patents by Inventor Emil TALPES

Emil TALPES has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210096874
    Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
    Type: Application
    Filed: September 27, 2019
    Publication date: April 1, 2021
    Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
  • Publication number: 20210048984
    Abstract: Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.
    Type: Application
    Filed: May 29, 2020
    Publication date: February 18, 2021
    Inventors: Peter Joseph Bannon, Kevin Altair Hurd, Emil Talpes
  • Publication number: 20200394095
    Abstract: A system for handling errors in a neural network includes a neural network processor for executing a neural network associated with use of a vehicle. The neural network processor includes an error detector configured to detect a data error associated with execution of the neural network and a neural network controller configured to receive a report of the data error from the error detector. In response to receiving the report, the neural network controller is further configured to signal that a pending result of the neural network is tainted without terminating execution of the neural network.
    Type: Application
    Filed: March 30, 2020
    Publication date: December 17, 2020
    Inventors: Christopher Hsiong, Emil Talpes, Debjlt Das Sarma, Peter Bannon, Kevin Hurd, Benjamin Floering
  • Publication number: 20200348909
    Abstract: A microprocessor system comprises a matrix computational unit and a control unit. The matrix computational unit includes one or more processing elements. The control unit is configured to provide a matrix processor instruction to the matrix computational unit. The matrix processor instruction specifies a floating-point operand formatted with an exponent that has been biased with a specified bias.
    Type: Application
    Filed: May 23, 2019
    Publication date: November 5, 2020
    Inventors: Debjit Das Sarma, William McGee, Emil Talpes
  • Patent number: 10747844
    Abstract: Presented are systems and methods that accelerate the convolution of an image and similar arithmetic operations by utilizing hardware-specific circuitry that enables a large number of operations to be performed in parallel across a large set of data. In various embodiments, arithmetic operations are further enhanced by reusing data and eliminating redundant steps of storing and fetching intermediate results from registers and memory when performing arithmetic operations.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: August 18, 2020
    Assignee: Tesla, Inc.
    Inventors: Peter Joseph Bannon, William A McGee, Emil Talpes
  • Patent number: 10671349
    Abstract: Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.
    Type: Grant
    Filed: September 20, 2017
    Date of Patent: June 2, 2020
    Assignee: Tesla, Inc.
    Inventors: Peter Joseph Bannon, Kevin Altair Hurd, Emil Talpes
  • Patent number: 10606678
    Abstract: A system for handling errors in a neural network includes a neural network processor for executing a neural network associated with use of a vehicle. The neural network processor includes an error detector configured to detect a data error associated with execution of the neural network and a neural network controller configured to receive a report of the data error from the error detector. In response to receiving the report, the neural network controller is further configured to signal that a pending result of the neural network is tainted without terminating execution of the neural network.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: March 31, 2020
    Assignee: Tesla, Inc.
    Inventors: Christopher Hsiong, Emil Talpes, Debjit Das Sarma, Peter Bannon, Kevin Hurd, Benjamin Floering
  • Publication number: 20190361699
    Abstract: Systems, apparatuses, and methods for implementing a fastpath microcode sequencer are disclosed. A processor includes at least an instruction decode unit and first and second microcode units. For each received instruction, the instruction decode unit forwards the instruction to the first microcode unit if the instruction satisfies at least a first condition. In one implementation, the first condition is the instruction being classified as a frequently executed instruction. If a received instruction satisfies at least a second condition, the instruction decode unit forwards the received instruction to a second microcode unit. In one implementation, the first microcode unit is a smaller, faster structure than the second microcode unit. In one implementation, the second condition is the instruction being classified as an infrequently executed instruction.
    Type: Application
    Filed: May 22, 2018
    Publication date: November 28, 2019
    Inventors: Kai Troester, Magiting Talisayon, Hongwen Gao, Benjamin Floering, Emil Talpes
  • Patent number: 10416899
    Abstract: In various embodiment, the present invention teaches a sequencer that identifies an address point of a first data block within a memory and a length of data that comprises that data block and is related to an input of a matrix processor. The sequencer then calculates, based on the block length, the input length, and a memory map, a block count representative of a number of data blocks that are to be retrieved from the memory. Using the address pointer, the sequencer may retrieve a number of data blocks from the memory in a number of cycles that depends on whether the data blocks are contiguous. In embodiments, based on the length of data, a formatter then maps the data blocks to the input of the matrix processor.
    Type: Grant
    Filed: June 5, 2018
    Date of Patent: September 17, 2019
    Assignee: Tesla, Inc.
    Inventors: Peter Joseph Bannon, Kevin Altair Hurd, Emil Talpes
  • Publication number: 20190250830
    Abstract: Presented are systems and methods that allow for efficient data processing that reduce data latency and, thus, power consumption and data management cost. In various embodiments, this is accomplished by using a sequencer that identifies an address pointer of a first data block within a memory and a length of data that comprises that data block and is related to an input of a matrix processor. The sequencer then calculates, based on the block length, the input length, and a memory map, a block count representative of a number of data blocks that are to be retrieved from the memory. Using the address pointer, the sequencer may retrieve a number of data blocks from the memory in a number of cycles that depends on whether the data blocks are contiguous. In embodiments, based on the length of data, a formatter then maps the data blocks to the input of the matrix processor.
    Type: Application
    Filed: June 5, 2018
    Publication date: August 15, 2019
    Applicant: Tesla, Inc.
    Inventors: Peter Joseph BANNON, Kevin Altair HURD, Emil TALPES
  • Publication number: 20190235866
    Abstract: A microprocessor system comprises a vector computational unit and a control unit. The vector computational unit includes a plurality of processing elements. The control unit is configured to provide at least a single processor instruction to the vector computational unit. The single processor instruction specifies a plurality of component instructions to be executed by the vector computational unit in response to the single processor instruction and each of the plurality of processing elements of the vector computational unit is configured to process different data elements in parallel with other processing elements in response to the single processor instruction.
    Type: Application
    Filed: March 13, 2018
    Publication date: August 1, 2019
    Inventors: Debjit Das Sarma, Emil Talpes, Peter Joseph Bannon
  • Publication number: 20190179870
    Abstract: Presented are systems and methods that accelerate the convolution of an image and similar arithmetic operations by utilizing hardware-specific circuitry that enables a large number of operations to be performed in parallel across a large set of data. In various embodiments, arithmetic operations are further enhanced by reusing data and eliminating redundant steps of storing and fetching intermediate results from registers and memory when performing arithmetic operations.
    Type: Application
    Filed: December 12, 2017
    Publication date: June 13, 2019
    Applicant: Tesla, Inc.
    Inventors: Peter Joseph BANNON, William A McGee, Emil Talpes
  • Publication number: 20190155678
    Abstract: A system for handling errors in a neural network includes a neural network processor for executing a neural network associated with use of a vehicle. The neural network processor includes an error detector configured to detect a data error associated with execution of the neural network and a neural network controller configured to receive a report of the data error from the error detector. In response to receiving the report, the neural network controller is further configured to signal that a pending result of the neural network is tainted without terminating execution of the neural network.
    Type: Application
    Filed: November 17, 2017
    Publication date: May 23, 2019
    Applicant: Tesla, Inc.
    Inventors: Christopher Hsiong, Emil Talpes, Debjit Das Sarma, Peter Bannon, Kevin Hurd, Benjamin Floering
  • Publication number: 20190026249
    Abstract: A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.
    Type: Application
    Filed: March 13, 2018
    Publication date: January 24, 2019
    Inventors: Emil Talpes, William McGee, Peter Joseph Bannon
  • Publication number: 20190026250
    Abstract: A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.
    Type: Application
    Filed: March 13, 2018
    Publication date: January 24, 2019
    Inventors: Debjit Das Sarma, Emil Talpes, Peter Joseph Bannon
  • Publication number: 20190026237
    Abstract: A microprocessor system comprises a computational array and a hardware arbiter. The computational array includes a plurality of computation units. Each of the plurality of computation units operates on a corresponding value addressed from memory. The hardware arbiter is configured to control issuing of at least one memory request for one or more of the corresponding values addressed from the memory for the computation units. The hardware arbiter is also configured to schedule a control signal to be issued based on the issuing of the memory requests.
    Type: Application
    Filed: March 13, 2018
    Publication date: January 24, 2019
    Inventors: Emil Talpes, Peter Joseph Bannon, Kevin Altair Hurd
  • Publication number: 20190026078
    Abstract: Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.
    Type: Application
    Filed: September 20, 2017
    Publication date: January 24, 2019
    Applicant: Tesla, Inc.
    Inventors: Peter Joseph BANNON, Kevin Altair HURD, Emil TALPES
  • Patent number: 9727340
    Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.
    Type: Grant
    Filed: July 17, 2013
    Date of Patent: August 8, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
  • Patent number: 9652305
    Abstract: A processor includes an execution unit to execute instructions and a scheduler unit to store a queue of instructions for execution by the execution unit. The scheduler unit includes a wake array including a plurality of source slots to store source identifiers for sources associated with the instructions, a picker to schedule a particular instruction for execution in the execution unit, broadcast a destination identifier associated with the particular instruction to a first subset of the source slots, and a delay element to receive the destination identifier broadcast by the picker and communicate a delayed version of the destination identifier to a second subset of the source slots different from the first subset.
    Type: Grant
    Filed: August 6, 2014
    Date of Patent: May 16, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Srikanth Arekapudi, Emil Talpes, Sahil Arora
  • Patent number: 9489206
    Abstract: A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: November 8, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan