Patents Examined by Michael J Metzger
  • Patent number: 11138009
    Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.
    Type: Grant
    Filed: August 10, 2018
    Date of Patent: October 5, 2021
    Assignee: NVIDIA CORPORATION
    Inventors: Ronald Charles Babich, Jr., John Burgess, Jack Choquette, Tero Karras, Samuli Laine, Ignacio Llamas, Gregory Muthler, William Parsons Newhall, Jr.
  • Patent number: 11126428
    Abstract: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: September 21, 2021
    Assignee: INTEL CORPORATION
    Inventors: Gregory Henry, Alexander Heinecke
  • Patent number: 11126511
    Abstract: Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.
    Type: Grant
    Filed: July 16, 2019
    Date of Patent: September 21, 2021
    Assignee: NeuroBlade, Ltd.
    Inventors: Elad Sity, Eliad Hillel
  • Patent number: 11128443
    Abstract: A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words Aj, Bj, Cj, Dj, Ej, Fj, Gj, and Hj that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words Aj+2, Bj+2, Ej+2, and Fj+2, which are to correspond to a round (j+2) of the SM3 hash algorithm.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: September 21, 2021
    Assignee: Intel Corporation
    Inventors: Shay Gueron, Vlad Krasnov
  • Patent number: 11126587
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 21, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11119972
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 14, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11119782
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 14, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11119765
    Abstract: A processor having a systolic array that can perform operations efficiently is provided. The processor includes multiple processing cores aligned in a matrix, and each of the processing cores includes an arithmetic unit array including multiple arithmetic units that can form a systolic array. Each of the processing cores includes a first memory that stores first data, a second memory that stores second data, a first multiplexer that connects a first input for receiving the first data at the arithmetic unit array to an output of the first memory in the processing core or an output of the arithmetic unit array in an adjacent processing core, and a second multiplexer that connects a second input for receiving the second data at the arithmetic unit array to an output of the second memory in the processing core or an output of the arithmetic unit array in an adjacent processing core.
    Type: Grant
    Filed: November 1, 2019
    Date of Patent: September 14, 2021
    Assignee: Preferred Networks, Inc.
    Inventor: Tanvir Ahmed
  • Patent number: 11113065
    Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: September 7, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Susumu Mashimo, Krishnan V. Ramani, Scott Thomas Bingham
  • Patent number: 11093439
    Abstract: A processor for performing deep learning is provided herein. The processor includes a processing element unit including a plurality of processing elements arranged in a matrix form including a first row of processing elements and a second row of processing elements. The processing elements are fed with filter data by a first data input unit which is connected to the first row processing elements. A second data input unit feeds target data to the processing elements. A shifter composed of registers feeds instructions to the processing elements. A controller in the processor controls the processing elements, the first data input unit and second data input unit to process the filter data and target data, thus providing sum of products (convolution) functionality.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: August 17, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyoung-hoon Kim, Young-hwan Park, Dong-kwan Suh, Keshava prasad Nagaraja, Suk-jin Kim, Han-su Cho, Hyun-jung Kim
  • Patent number: 11068263
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: July 20, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11068262
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: July 20, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11061854
    Abstract: A vector reduction circuit configured to reduce an input vector of elements comprises a plurality of cells, wherein each of the plurality of cells other than a designated first cell that receives a designated first element of the input vector is configured to receive a particular element of the input vector, receive, from another of the one or more cells, a temporary reduction element, perform a reduction operation using the particular element and the temporary reduction element, and provide, as a new temporary reduction element, a result of performing the reduction operation using the particular element and the temporary reduction element. The vector reduction circuit also comprises an output circuit configured to provide, for output as a reduction of the input vector, a new temporary reduction element corresponding to a result of performing the reduction operation using a last element of the input vector.
    Type: Grant
    Filed: July 1, 2020
    Date of Patent: July 13, 2021
    Assignee: Google LLC
    Inventors: Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam
  • Patent number: 11061682
    Abstract: The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
    Type: Grant
    Filed: December 13, 2015
    Date of Patent: July 13, 2021
    Inventor: Martin Vorbach
  • Patent number: 11055100
    Abstract: Embodiments of the present disclosure relate to a method for processing information, and a processor. The processor includes an arithmetic and logic unit, a bypass unit, a queue unit, a multiplexer, and a register file. The bypass unit includes a data processing subunit; the data processing subunit is configured to acquire at least one valid processing result outputted by the arithmetic and logic unit, determine a processing result from the at least one valid processing result, output the determined processing result to the multiplexer, and output processing results except for the determined processing result of among the at least one valid processing result to the queue unit; and the multiplexer is configured to sequentially output more than one valid processing results to the register file.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: July 6, 2021
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventor: Jian Ouyang
  • Patent number: 11048609
    Abstract: A trace module has monitoring circuitry for monitoring processing of instructions by processing circuitry, and trace output circuitry for outputting a sequence of elements indicative of outcomes of the processing of instructions by the processing circuitry. The trace module supports output of a commit window move element indicating that a commit window, representing a portion of the trace stream comprising at least one speculative element representing at least one speculatively executed instruction, should move while the oldest remaining speculative element of the trace stream remains uncommitted. This can be useful for tracing of transactional memory functionality within program code.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: June 29, 2021
    Assignee: Arm Limited
    Inventor: Michael John Gibbs
  • Patent number: 11036504
    Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: June 15, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11029950
    Abstract: A move data instruction to move data from one location to another location is obtained. Based on obtaining the move data instruction, a determination is made as to whether the data to be moved is located in a buffer. The buffer is configured to maintain the data for use by multiple move data instructions. The buffer is used to move the data from the one location to the other location, based on determining that the data to be moved is in the buffer.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: June 8, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yossi Shapira, Yair Fried, Eyal Naor, Amir Turi
  • Patent number: 11023230
    Abstract: The apparatus and method for calculating and retaining a bound on error during floating-point operations inserts an additional bounding field into the standard floating-point format that records the retained significant bits of the calculation with notification upon insufficient retention. The bounding field, accounting for both rounding and cancellation errors, includes the lost bits D Field and the accumulated rounding error R Field. The D Field states the number of bits in the floating-point representation that are no longer meaningful. The bounds on the represented real value are determined by the truncated floating-point value and the addition of the error determined by the number of lost bits. The true, real value is absolutely contained by these bounds. The allowable loss (optionally programmable) of significant digits provides a fail-safe, real-time notification of loss of significant digits. This allows representation of real numbers accurate to the last digit.
    Type: Grant
    Filed: January 20, 2020
    Date of Patent: June 1, 2021
    Inventor: Alan A. Jorgensen
  • Patent number: 11016765
    Abstract: Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes a computing device (or “tile”) including a processing unit and a memory resource configured as a cache for the processing unit. The computing device can include circuitry to receive a command to initiate an operation to convert data comprising a bit string having a first format that supports arithmetic operations to a first level of precision to a bit string having a second format that supports arithmetic operations to a second level of precision. The computing device can receive, by the memory resource, the bit string based, at least in part, on receipt of the command and, responsive to receipt of the data, perform the operation on the bit string to convert the data from the first format to the second format.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: May 25, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Vijay S. Ramesh