Patents Examined by Michael J Metzger
-
Patent number: 11138009Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.Type: GrantFiled: August 10, 2018Date of Patent: October 5, 2021Assignee: NVIDIA CORPORATIONInventors: Ronald Charles Babich, Jr., John Burgess, Jack Choquette, Tero Karras, Samuli Laine, Ignacio Llamas, Gregory Muthler, William Parsons Newhall, Jr.
-
Patent number: 11126428Abstract: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.Type: GrantFiled: December 17, 2020Date of Patent: September 21, 2021Assignee: INTEL CORPORATIONInventors: Gregory Henry, Alexander Heinecke
-
Patent number: 11126511Abstract: Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.Type: GrantFiled: July 16, 2019Date of Patent: September 21, 2021Assignee: NeuroBlade, Ltd.Inventors: Elad Sity, Eliad Hillel
-
Patent number: 11128443Abstract: A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words Aj, Bj, Cj, Dj, Ej, Fj, Gj, and Hj that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words Aj+2, Bj+2, Ej+2, and Fj+2, which are to correspond to a round (j+2) of the SM3 hash algorithm.Type: GrantFiled: November 6, 2020Date of Patent: September 21, 2021Assignee: Intel CorporationInventors: Shay Gueron, Vlad Krasnov
-
Patent number: 11126587Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.Type: GrantFiled: April 30, 2019Date of Patent: September 21, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11119972Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.Type: GrantFiled: April 30, 2019Date of Patent: September 14, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11119782Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.Type: GrantFiled: April 30, 2019Date of Patent: September 14, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11119765Abstract: A processor having a systolic array that can perform operations efficiently is provided. The processor includes multiple processing cores aligned in a matrix, and each of the processing cores includes an arithmetic unit array including multiple arithmetic units that can form a systolic array. Each of the processing cores includes a first memory that stores first data, a second memory that stores second data, a first multiplexer that connects a first input for receiving the first data at the arithmetic unit array to an output of the first memory in the processing core or an output of the arithmetic unit array in an adjacent processing core, and a second multiplexer that connects a second input for receiving the second data at the arithmetic unit array to an output of the second memory in the processing core or an output of the arithmetic unit array in an adjacent processing core.Type: GrantFiled: November 1, 2019Date of Patent: September 14, 2021Assignee: Preferred Networks, Inc.Inventor: Tanvir Ahmed
-
Speculative instruction wakeup to tolerate draining delay of memory ordering violation check buffers
Patent number: 11113065Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.Type: GrantFiled: October 31, 2019Date of Patent: September 7, 2021Assignee: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Susumu Mashimo, Krishnan V. Ramani, Scott Thomas Bingham -
Patent number: 11093439Abstract: A processor for performing deep learning is provided herein. The processor includes a processing element unit including a plurality of processing elements arranged in a matrix form including a first row of processing elements and a second row of processing elements. The processing elements are fed with filter data by a first data input unit which is connected to the first row processing elements. A second data input unit feeds target data to the processing elements. A shifter composed of registers feeds instructions to the processing elements. A controller in the processor controls the processing elements, the first data input unit and second data input unit to process the filter data and target data, thus providing sum of products (convolution) functionality.Type: GrantFiled: September 27, 2018Date of Patent: August 17, 2021Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Kyoung-hoon Kim, Young-hwan Park, Dong-kwan Suh, Keshava prasad Nagaraja, Suk-jin Kim, Han-su Cho, Hyun-jung Kim
-
Patent number: 11068263Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.Type: GrantFiled: December 23, 2020Date of Patent: July 20, 2021Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 11068262Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.Type: GrantFiled: December 23, 2020Date of Patent: July 20, 2021Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 11061854Abstract: A vector reduction circuit configured to reduce an input vector of elements comprises a plurality of cells, wherein each of the plurality of cells other than a designated first cell that receives a designated first element of the input vector is configured to receive a particular element of the input vector, receive, from another of the one or more cells, a temporary reduction element, perform a reduction operation using the particular element and the temporary reduction element, and provide, as a new temporary reduction element, a result of performing the reduction operation using the particular element and the temporary reduction element. The vector reduction circuit also comprises an output circuit configured to provide, for output as a reduction of the input vector, a new temporary reduction element corresponding to a result of performing the reduction operation using a last element of the input vector.Type: GrantFiled: July 1, 2020Date of Patent: July 13, 2021Assignee: Google LLCInventors: Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam
-
Patent number: 11061682Abstract: The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).Type: GrantFiled: December 13, 2015Date of Patent: July 13, 2021Inventor: Martin Vorbach
-
Patent number: 11055100Abstract: Embodiments of the present disclosure relate to a method for processing information, and a processor. The processor includes an arithmetic and logic unit, a bypass unit, a queue unit, a multiplexer, and a register file. The bypass unit includes a data processing subunit; the data processing subunit is configured to acquire at least one valid processing result outputted by the arithmetic and logic unit, determine a processing result from the at least one valid processing result, output the determined processing result to the multiplexer, and output processing results except for the determined processing result of among the at least one valid processing result to the queue unit; and the multiplexer is configured to sequentially output more than one valid processing results to the register file.Type: GrantFiled: July 3, 2019Date of Patent: July 6, 2021Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.Inventor: Jian Ouyang
-
Patent number: 11048609Abstract: A trace module has monitoring circuitry for monitoring processing of instructions by processing circuitry, and trace output circuitry for outputting a sequence of elements indicative of outcomes of the processing of instructions by the processing circuitry. The trace module supports output of a commit window move element indicating that a commit window, representing a portion of the trace stream comprising at least one speculative element representing at least one speculatively executed instruction, should move while the oldest remaining speculative element of the trace stream remains uncommitted. This can be useful for tracing of transactional memory functionality within program code.Type: GrantFiled: December 10, 2018Date of Patent: June 29, 2021Assignee: Arm LimitedInventor: Michael John Gibbs
-
Patent number: 11036504Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.Type: GrantFiled: December 23, 2020Date of Patent: June 15, 2021Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 11029950Abstract: A move data instruction to move data from one location to another location is obtained. Based on obtaining the move data instruction, a determination is made as to whether the data to be moved is located in a buffer. The buffer is configured to maintain the data for use by multiple move data instructions. The buffer is used to move the data from the one location to the other location, based on determining that the data to be moved is in the buffer.Type: GrantFiled: July 3, 2019Date of Patent: June 8, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yossi Shapira, Yair Fried, Eyal Naor, Amir Turi
-
Patent number: 11023230Abstract: The apparatus and method for calculating and retaining a bound on error during floating-point operations inserts an additional bounding field into the standard floating-point format that records the retained significant bits of the calculation with notification upon insufficient retention. The bounding field, accounting for both rounding and cancellation errors, includes the lost bits D Field and the accumulated rounding error R Field. The D Field states the number of bits in the floating-point representation that are no longer meaningful. The bounds on the represented real value are determined by the truncated floating-point value and the addition of the error determined by the number of lost bits. The true, real value is absolutely contained by these bounds. The allowable loss (optionally programmable) of significant digits provides a fail-safe, real-time notification of loss of significant digits. This allows representation of real numbers accurate to the last digit.Type: GrantFiled: January 20, 2020Date of Patent: June 1, 2021Inventor: Alan A. Jorgensen
-
Patent number: 11016765Abstract: Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes a computing device (or “tile”) including a processing unit and a memory resource configured as a cache for the processing unit. The computing device can include circuitry to receive a command to initiate an operation to convert data comprising a bit string having a first format that supports arithmetic operations to a first level of precision to a bit string having a second format that supports arithmetic operations to a second level of precision. The computing device can receive, by the memory resource, the bit string based, at least in part, on receipt of the command and, responsive to receipt of the data, perform the operation on the bit string to convert the data from the first format to the second format.Type: GrantFiled: April 29, 2019Date of Patent: May 25, 2021Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh