Floating Point Or Vector Patents (Class 712/222)

Conducting built-in self-test of memory macro

Patent number: 12400725

Abstract: Performing a built-in self-test (BIST) on a memory macro includes generating a plurality of input vectors such that at least one input vector of the plurality of input vectors is transmitted to the memory macro in each of a plurality of cycles, receiving in each of the plurality of cycles, an output data from the memory macro. The output data is generated by the memory macro in response to processing the at least one input vector. The BIST also includes comparing the output data in each of the plurality of cycles with a signature value and determining whether the memory macro is normal or faulty based upon the comparison.

Type: Grant

Filed: October 13, 2023

Date of Patent: August 26, 2025

Assignee: Taiwan Semiconductor Manufacturing Company, Ltd.

Inventors: Saman Adham, Marat Gershoig, Vineet Joshi, Ted Wong
Compiler, generation method, chip, and execution method

Patent number: 12367025

Abstract: A compiler, for generating machine code to be executed in a chip including a plurality of distributed memories connected by a tree structure topology, includes at least one memory and at least one processor. The at least one processor is configured to associate each element of a tensor to be processed with an address in the plurality of memories included in the chip, based on a stride and a number of divisions in a predetermined hierarchy of the tree structure with respect to the tensor to be processed.

Type: Grant

Filed: October 24, 2022

Date of Patent: July 22, 2025

Assignee: Preferred Networks, Inc.

Inventor: Seiya Tokui
Apparatuses, methods, and systems for instructions for matrix multiplication instructions

Patent number: 12353878

Abstract: Techniques for matrix multiplication are described. In some examples, decode circuitry is to decode a single instruction having fields for an opcode, an indication of a location of a first source operand, an indication of a location of a second source operand, and an indication of a location of a destination operand, wherein the opcode is to indicate that execution circuitry is to at least convert data elements of the first and second source operands from a first floating point representation to a second floating point representation, perform matrix multiplication with the converted data elements, and accumulate results of the matrix multiplication in the destination operand in the first floating point representation; and the execution circuitry is to execute to the decoded instruction as specified by the opcode.

Type: Grant

Filed: June 26, 2021

Date of Patent: July 8, 2025

Assignee: Intel Corporation

Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich, Sagi Meller, Christopher Hughes, Evangelos Georganas, Alexander Heinecke, Mark Charney
Generalized acceleration of matrix multiply accumulate operations

Patent number: 12321743

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: October 6, 2023

Date of Patent: June 3, 2025

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Electronic device managing duplicate requests and method of operating the same

Patent number: 12299312

Abstract: An electronic device includes an input handling circuit, a control circuit, and a data transfer circuit. The input handling circuits receives a first request including an address from a first memory device, aligns the address with an access unit of a second memory device, requests a determination for the aligned address, and transmits a second request to the second memory device based on a determination result. The control circuit determines, based on the request, whether a duplicate address with the aligned address is present to generate the determination result and updates a bitmask based on the determination result. The data transfer circuit receives the second request from the second memory device and transfers data based on the bitmask. The bitmask includes one or more bits, each corresponding to the first request and indicating a location corresponding to the first request within an access unit of the second memory device.

Type: Grant

Filed: December 11, 2023

Date of Patent: May 13, 2025

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Sangoak Woo, Jaeho Shin, Hyun Jae Oh
Systems, methods, and apparatuses for tile matrix multiplication and accumulation

Patent number: 12147804

Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.

Type: Grant

Filed: July 22, 2021

Date of Patent: November 19, 2024

Assignee: Intel Corporation

Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Stanislav Shwartsman, Dan Baum, Igor Yanover, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Jesus Corbal, Yuri Gebil, Simon Rubanovich
Systems, methods, and apparatuses for matrix operations

Patent number: 12106100

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Type: Grant

Filed: July 1, 2017

Date of Patent: October 1, 2024

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Dan Baum, Zeev Sperber, Jesus Corbal, Bret L. Toll, Raanan Sade, Igor Yanover, Yuri Gebil, Rinat Rappoport, Stanislav Shwartsman, Menachem Adelman, Simon Rubanovich
Compute optimization mechanism

Patent number: 12056788

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.

Type: Grant

Filed: March 1, 2022

Date of Patent: August 6, 2024

Assignee: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
Apparatuses, methods, and systems for instructions for loading a tile of a matrix operations accelerator

Patent number: 12001385

Abstract: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described.

Type: Grant

Filed: December 24, 2020

Date of Patent: June 4, 2024

Assignee: Intel Corporation

Inventor: Elmoustapha Ould-Ahmed-Vall
Methods and circuits for streaming data to processing elements in stacked processor-plus-memory architecture

Patent number: 11960438

Abstract: A stacked processor-plus-memory device includes a processing die with an array of processing elements of an artificial neural network. Each processing element multiplies a first operand—e.g. a weight—by a second operand to produce a partial result to a subsequent processing element. To prepare for these computations, a sequencer loads the weights into the processing elements as a sequence of operands that step through the processing elements, each operand stored in the corresponding processing element. The operands can be sequenced directly from memory to the processing elements or can be stored first in cache. The processing elements include streaming logic that disregards interruptions in the stream of operands.

Type: Grant

Filed: August 24, 2021

Date of Patent: April 16, 2024

Assignee: Rambus Inc.

Inventors: Steven C. Woo, Michael Raymond Miller
Vector reverse

Patent number: 11900112

Abstract: A method to reverse source data in a processor in response to a vector reverse instruction includes specifying, in respective fields of the vector reverse instruction, a source register containing the source data and a destination register. The source register includes a plurality of lanes and each lane contains a data element, and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector reverse instruction by creating reversed source data by reversing the order of the data elements, and storing the reversed source data in the destination register.

Type: Grant

Filed: March 28, 2022

Date of Patent: February 13, 2024

Assignee: Texas Instruments Incorporated

Inventors: Timothy D. Anderson, Duc Bui
Apparatuses, methods, and systems for instructions to multiply floating-point values of about zero

Patent number: 11875154

Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described.

Type: Grant

Filed: December 13, 2019

Date of Patent: January 16, 2024

Assignee: Intel Corporation

Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
Apparatuses, methods, and systems for instructions to multiply values of zero

Patent number: 11847450

Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of zero are described.

Type: Grant

Filed: December 13, 2019

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
Generalized acceleration of matrix multiply accumulate operations

Patent number: 11816482

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: August 18, 2022

Date of Patent: November 14, 2023

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Generalized acceleration of matrix multiply accumulate operations

Patent number: 11816481

Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.

Type: Grant

Filed: August 18, 2022

Date of Patent: November 14, 2023

Assignee: NVIDIA Corporation

Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
Client-side filesystem for a remote repository

Patent number: 11755764

Abstract: Methods, systems, and computer-readable media for a client-side filesystem for a remote repository are disclosed. One or more files of a repository are sent from a storage service to a client device. The file(s) are obtained by the client using a credential sent by a repository manager. Local copies of the file(s) are accessible via a local filesystem mounted at the client device. One or more new files associated with the repository are generated at the client device. Using the credential, the one or more new files are obtained at the storage service from the client device. The one or more new files are added to the repository.

Type: Grant

Filed: July 1, 2022

Date of Patent: September 12, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Marvin Michael Theimer, Julien Jacques Ellie, Colin Watson, Ullas Sankhla, Swapandeep Singh, Kerry Hart, Paul Anderson, Brian Dahmen, Suchi Nandini, Yunhan Chen, Shu Liu, Arjun Raman, Yuxin Xie, Fengjia Xiong
SMID processing unit performing concurrent load/store and ALU operations

Patent number: 11640302

Abstract: A computing device comprising: a plurality of ALUs; a set of registers; a memory; a memory interface between the registers and the memory; a control unit controlling the ALUs by generating: at least one cycle i including both implementing at least one first computing operation by way of an arithmetic logic unit and downloading a first dataset from the memory to at least one register; at least one cycle ii, following the at least one cycle i, including implementing a second computing operation by way of an arithmetic logic unit, for which second computing operation at least part of the first dataset forms at least one operand.

Type: Grant

Filed: May 21, 2019

Date of Patent: May 2, 2023

Assignee: VSORA

Inventors: Khaled Maalej, Trung-Dung Nguyen, Julien Schmitt, Pierre-Emmanuel Bernard
Method of using multidimensional blockification to optimize computer program and device thereof

Patent number: 11640285

Abstract: Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.

Type: Grant

Filed: April 5, 2022

Date of Patent: May 2, 2023

Assignee: Blaize, Inc.

Inventors: Ravi Korsa, Aravind Rajulapudi, Pathikonda Datta Nagraj
Vector floating-point classification

Patent number: 11467832

Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.

Type: Grant

Filed: March 29, 2021

Date of Patent: October 11, 2022

Assignee: Texas Instruments Incorporated

Inventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
Floating point multiply-add, accumulate unit with exception processing

Patent number: 11442696

Abstract: A Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard is described with exception handling. Operations including exception handling in a way that does not interfere with execution of data flow operations, overflow detection, zero detection and sign extension are adopted for 2's complement and Carry-Save format.

Type: Grant

Filed: November 23, 2021

Date of Patent: September 13, 2022

Assignee: SAMBANOVA SYSTEMS, INC.

Inventors: Vojin G. Oklobdzija, Matthew M. Kim
Systems and methods to skip inconsequential matrix operations

Patent number: 11403097

Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.

Type: Grant

Filed: June 26, 2019

Date of Patent: August 2, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
Method and apparatus for data-ready memory operations

Patent number: 11360771

Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.

Type: Grant

Filed: June 30, 2017

Date of Patent: June 14, 2022

Assignee: Intel Corporation

Inventors: William M. Brown, Mikhail Plotnikov, Christopher J. Hughes
Vector instructions for selecting and extending an unsigned sum of products of words and doublewords for accumulation

Patent number: 11249755

Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.

Type: Grant

Filed: September 27, 2017

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
Method, apparatus, and computer-readable medium for format preserving encryption of a numerical value

Patent number: 11222554

Abstract: A system, method and computer-readable medium format-preserving encryption of a numerical value, including storing a binary numerical value, the binary numerical value comprising a plurality of binary bits, dividing the plurality of binary bits into a plurality of bit groups and storing the plurality of bit groups in a plurality of bytes, encrypting each byte in the plurality of bytes using a radix value corresponding to a quantity of binary bits in a bit group corresponding to that byte to generate a plurality of ciphertext bytes, and combining a quantity of least-significant bits from each ciphertext byte in the plurality of ciphertext bytes to generate a binary ciphertext value, the quantity of least-significant bits combined from each ciphertext byte corresponding to the radix value used to generate that ciphertext byte.

Type: Grant

Filed: August 16, 2019

Date of Patent: January 11, 2022

Assignee: INFORMATICA LLC

Inventors: Igor Balabine, Rajagopal Guduru, Ramesh Nallamothu
Exception interception

Patent number: 11216280

Abstract: Exception control circuitry controls exception handling for processing circuitry. In response to an initial exception occurring when the processing circuitry is in a given exception level, the initial exception to be handled in a target exception level, the exception control circuitry stores exception control information to at least one exception control register associated with the target exception level, indicating at least one property of the initial exception or of processor state at a time the initial exception occurred. When at least one exception intercept configuration parameter stored in a configuration register indicates that exception interception is enabled, after storing the exception control information, and before the processing circuitry starts processing an exception handler for handling the initial exception in the target exception level, the exception control circuitry triggers a further exception to be handled in a predetermined exception level.

Type: Grant

Filed: November 26, 2019

Date of Patent: January 4, 2022

Assignee: Arm Limited

Inventor: Simon John Craske
Fused overloaded register file read to enable 2-cycle move from condition register instruction in a microprocessor

Patent number: 11194578

Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor, a register file associated with the at least one processor, preferably a condition register that stores status information, the register file having multiple locations for storing data, multiple ports to write data to and read data from the register file. The system or processor includes an execution area, and the processor is configured to read from all the read ports in a first cycle, and to read from all the read ports in a second cycle. In an embodiment, the execution area includes a staging latch to store data from a first cycle read operation, and in an aspect the computer system is configured to combine the data stored in the staging latch during a first read cycle with the data read from the second cycle.

Type: Grant

Filed: May 23, 2018

Date of Patent: December 7, 2021

Assignee: International Business Machines Corporation

Inventors: Steven J. Battle, Brian D. Barrick, Joshua W. Bowman, Susan E. Eisen, Brandon Goddard, Cliff Kucharski, Dung Q. Nguyen, David S. Walder
Micro-architecture designs and methods for eager execution and fetching of instructions

Patent number: 11188337

Abstract: Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.

Type: Grant

Filed: September 30, 2019

Date of Patent: November 30, 2021

Assignees: The Florida State University Research Foundation, Inc., Michigan Technological University

Inventors: David Whalley, Soner Onder
Hexadecimal exponent alignment for binary floating point unit

Patent number: 11175890

Abstract: Examples of techniques for hexadecimal exponent alignment for a binary floating point unit (BFU) of a computer processor are described herein. An aspect includes receiving, by the BFU, a first operand comprising a first fraction and a first exponent, and a second operand comprising a second fraction and a second exponent. Another aspect includes, based on the first operand and the second operand being in a first floating point format, multiplying each of the first exponent and the second exponent by a factor corresponding to a number of bits in a digit in the first floating point format.

Type: Grant

Filed: April 30, 2019

Date of Patent: November 16, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kerstin Claudia Schelm, Petra Leber, Nicol Hofmann, Michael Klein
Method and apparatus for converting scatter control elements to gather control elements used to sort vector data elements

Patent number: 11169809

Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.

Type: Grant

Filed: March 31, 2017

Date of Patent: November 9, 2021

Assignee: Intel Corporation

Inventor: Mikhail Plotnikov
Compression scheme for floating point values

Patent number: 11138151

Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program determines a scale value based on a plurality of floating point values. The program further scales the plurality of floating point values based on the scale value. The program also converts the plurality of floating point values to a plurality of integer values. The program further determines an integer encoding scheme from a plurality of integer encoding schemes. The program also encodes the plurality of integer values based on the determined integer encoding scheme.

Type: Grant

Filed: August 30, 2018

Date of Patent: October 5, 2021

Assignee: SAP SE

Inventor: Martin Rupp
Memory controller and method of operating the same

Patent number: 11137912

Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.

Type: Grant

Filed: July 31, 2020

Date of Patent: October 5, 2021

Assignee: SK hynix Inc.

Inventors: Sung Kwan Hong, Yeong Sik Yi
Apparatus and method for generating a vector of elements with a wrapping constraint

Patent number: 11126691

Abstract: An apparatus is provided that receives a scalar start value, an adjust amount and wrapping control information, and includes vector generating circuitry for generating a vector comprising a plurality of elements such that a value of a first element is dependent on the scalar start value, and values of the plurality of elements follow a regularly progressing sequence that is constrained to wrap as required to ensure that each value is within bounds determined from the wrapping control information. The adjust amount is used to determine a difference between values of adjacent elements in the regularly progressing sequence. The vector generating circuitry has first adder circuitry for generating a plurality of first candidate values for the plurality of elements, assuming absence of a wrapping condition, and second adder circuitry for generating a plurality of second candidate values for the plurality of elements, assume presence of a wrapping condition.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 21, 2021

Assignee: Arm Limited

Inventor: Jack William Derek Andrew
Device including multi-mode input pad

Patent number: 11115015

Abstract: A circuit component has an address determined from a voltage level applied to a single electrical contact of the circuit component. The circuit component is configured to be assigned one of at least three unique addresses and to select from among the at least three unique addresses based on the voltage level.

Type: Grant

Filed: November 25, 2019

Date of Patent: September 7, 2021

Assignee: SKYWORKS SOLUTIONS, INC.

Inventors: Bo Zhou, Guillaume Alexandre Blin
Instructions and logic to perform floating point and integer operations for machine learning

Patent number: 11080046

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

Type: Grant

Filed: February 5, 2021

Date of Patent: August 3, 2021

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Counting elements in data items in a data processing apparatus

Patent number: 11042375

Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.

Type: Grant

Filed: August 1, 2017

Date of Patent: June 22, 2021

Assignee: ARM Limited

Inventors: Mbou Eyole, Jesse Garrett Beu, Alejandro Martinez Vicente, Timothy Hayes
Instruction and logic for systolic dot product with accumulate

Patent number: 11042370

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Type: Grant

Filed: April 19, 2018

Date of Patent: June 22, 2021

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
Processing of a temporary-register-using instruction including determining whether to process a register move micro-operation for transferring data from a first register file to a second register file based on whether a temporary variable is still available in the second register file

Patent number: 11036511

Abstract: An apparatus has a processing pipeline, and first and second register files. A temporary-register-using instruction is supported which controls the pipeline to perform an operation using a temporary variable derived from an operand stored in the first register file. In response to the instruction, when a predetermined condition is not satisfied, the pipeline processes at least one register move micro-operation to transfer data from the at least one source register of the first register file to at least one newly allocated temporary register of the second register file. When the condition is satisfied, the operation can be performed using a temporary variable already stored in the temporary register of the second register file used by an earlier temporary-register-using instruction specifying the same source register for determining the temporary variable, in the absence of an intervening instruction for rewriting the source register.

Type: Grant

Filed: July 29, 2019

Date of Patent: June 15, 2021

Assignee: Arm Limited

Inventors: Xiaoyang Shen, Damien Robin Martin, Cédric Denis Robert Airaud, Luca Nassi, François Donati
System and method for instruction mapping in an out-of-order processor

Patent number: 10996957

Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.

Type: Grant

Filed: June 20, 2019

Date of Patent: May 4, 2021

Assignee: MARVELL ASIA PTE, LTD.

Inventor: David A. Carlson
Systems and methods to transpose vectors on-the-fly while loading from memory

Patent number: 10970072

Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.

Type: Grant

Filed: December 21, 2018

Date of Patent: April 6, 2021

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Evangelos Georganas, Christopher J. Hughes, Raanan Sade, Robert Valentine
Vector floating-point classification

Patent number: 10963247

Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.

Type: Grant

Filed: May 24, 2019

Date of Patent: March 30, 2021

Assignee: Texas Instruments Incorporated

Inventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
Systems and methods for performing 16-bit floating-point matrix dot product instructions

Patent number: 10963246

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Type: Grant

Filed: November 9, 2018

Date of Patent: March 30, 2021

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Handling operation collisions in a non-volatile memory

Patent number: 10942879

Abstract: A first operation identifier is assigned to a current operation directed to a memory component, the first operation identifier having a first entry in a first data structure that associates the first operation identifier with a first buffer identifier. It is determined whether the current operation collides with a prior operation assigned a second operation identifier, the second operation identifier having a second entry in the first data structure that associates the second operation identifier with a second buffer identifier. A latest flag is updated to indicate that the first entry is a latest operation directed to an address (1) in response to determining that the current operation collides with the prior operation and that the current and prior operations are read operations, or (2) in response to determining to determining that the current operation does not collide with a prior operation.

Type: Grant

Filed: May 28, 2020

Date of Patent: March 9, 2021

Assignee: MICRON TECHNOLOGY, INC.

Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
Computerized method and systems for performing deferred safety check operations

Patent number: 10761970

Abstract: The invention is notably directed to a computer-implemented method for performing safety check operations. The method comprises steps that are implemented while executing a computer program, which is instrumented with safety check operations. As a result, this computer program forms a sequence of ordered instructions. Such instructions comprise safety check operation instructions, in addition to generic execution instructions and system inputs. System inputs allow the executing program to interact with an operating system, which manages resources for the computer program to execute. A series of instructions are identified while executing the computer program. Namely, a first instruction is identified in the sequence, as one of the safety check operation instructions, in view of its subsequent execution. After having identified the first instruction, a second instruction is identified in the sequence. The second instruction is identified as one of the generic computer program instructions.

Type: Grant

Filed: October 20, 2017

Date of Patent: September 1, 2020

Assignee: International Business Machines Corporation

Inventors: Anil Kurmus, Matthias Neugschwandtner, Alessandro Sorniotti
Memory controller and method of operating the same

Patent number: 10761728

Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.

Type: Grant

Filed: December 10, 2018

Date of Patent: September 1, 2020

Assignee: SK hynix Inc.

Inventors: Sung Kwan Hong, Yeong Sik Yi
Selective updating of floating point controls

Patent number: 10740067

Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.

Type: Grant

Filed: June 23, 2017

Date of Patent: August 11, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Valentina Salapura
Data structure descriptors for deep learning acceleration

Patent number: 10726329

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Type: Grant

Filed: April 17, 2018

Date of Patent: July 28, 2020

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
Hardware accelerators and methods for high-performance authenticated encryption

Patent number: 10705842

Abstract: Methods and apparatuses relating to high-performance authenticated encryption are described.

Type: Grant

Filed: April 2, 2018

Date of Patent: July 7, 2020

Assignee: INTEL CORPORATION

Inventors: Vikram Suresh, Sanu Mathew, Sudhir Satpathy, Vinodh Gopal
Image processing apparatus and image processing method

Patent number: 10672100

Abstract: An apparatus determines a second pixel range of an uncorrected image necessary to generate a first pixel range having pixels in a preset range of a corrected image, including a cache unit determining the second pixel range and reading and holding the second pixel range from memory before executing correction. Correspondences indicating positions of the uncorrected image corresponding to positions of pixels of the corrected image, respectively, are preset. The cache unit specifies a position of the uncorrected image corresponding to a pixel of one of four corners of a rectangular third pixel range including the first pixel range based on the correspondence, specifies pixel ranges of the uncorrected image necessary for pixel value generation, respectively, at the four corners of the third pixel range based on the specified position, and determines a pixel range including a convex set including the specified pixel ranges as the second pixel range.

Type: Grant

Filed: October 12, 2016

Date of Patent: June 2, 2020

Assignee: Hitachi Automotive Systems, Ltd.

Inventors: Yusuke Uchida, Tetsuya Yamada, Shigeru Matsuo, Manabu Sasamoto
Streaming engine with cache-like stream data storage and lifetime tracking

Patent number: 10592243

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.

Type: Grant

Filed: September 10, 2018

Date of Patent: March 17, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventor: Joseph Zbiciak
Processors, methods, systems, and instructions to generate sequences of consecutive integers in numerical order

Patent number: 10565283

Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four consecutive non-negative integers in numerical order. In an aspect, the instruction does not indicate a source packed data operand having a plurality of packed data elements in an architecturally-visible storage location. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2011

Date of Patent: February 18, 2020

Assignee: Intel Corporation

Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein

1 2 3 4 5 … next