Floating Point Or Vector Patents (Class 712/222)
  • Patent number: 12147804
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
    Type: Grant
    Filed: July 22, 2021
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Stanislav Shwartsman, Dan Baum, Igor Yanover, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Jesus Corbal, Yuri Gebil, Simon Rubanovich
  • Patent number: 12106100
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: October 1, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Dan Baum, Zeev Sperber, Jesus Corbal, Bret L. Toll, Raanan Sade, Igor Yanover, Yuri Gebil, Rinat Rappoport, Stanislav Shwartsman, Menachem Adelman, Simon Rubanovich
  • Patent number: 12056788
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.
    Type: Grant
    Filed: March 1, 2022
    Date of Patent: August 6, 2024
    Assignee: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
  • Patent number: 12001385
    Abstract: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11960438
    Abstract: A stacked processor-plus-memory device includes a processing die with an array of processing elements of an artificial neural network. Each processing element multiplies a first operand—e.g. a weight—by a second operand to produce a partial result to a subsequent processing element. To prepare for these computations, a sequencer loads the weights into the processing elements as a sequence of operands that step through the processing elements, each operand stored in the corresponding processing element. The operands can be sequenced directly from memory to the processing elements or can be stored first in cache. The processing elements include streaming logic that disregards interruptions in the stream of operands.
    Type: Grant
    Filed: August 24, 2021
    Date of Patent: April 16, 2024
    Assignee: Rambus Inc.
    Inventors: Steven C. Woo, Michael Raymond Miller
  • Patent number: 11900112
    Abstract: A method to reverse source data in a processor in response to a vector reverse instruction includes specifying, in respective fields of the vector reverse instruction, a source register containing the source data and a destination register. The source register includes a plurality of lanes and each lane contains a data element, and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector reverse instruction by creating reversed source data by reversing the order of the data elements, and storing the reversed source data in the destination register.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: February 13, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Duc Bui
  • Patent number: 11875154
    Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: January 16, 2024
    Assignee: Intel Corporation
    Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11847450
    Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of zero are described.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11816482
    Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
    Type: Grant
    Filed: August 18, 2022
    Date of Patent: November 14, 2023
    Assignee: NVIDIA Corporation
    Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
  • Patent number: 11816481
    Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
    Type: Grant
    Filed: August 18, 2022
    Date of Patent: November 14, 2023
    Assignee: NVIDIA Corporation
    Inventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
  • Patent number: 11755764
    Abstract: Methods, systems, and computer-readable media for a client-side filesystem for a remote repository are disclosed. One or more files of a repository are sent from a storage service to a client device. The file(s) are obtained by the client using a credential sent by a repository manager. Local copies of the file(s) are accessible via a local filesystem mounted at the client device. One or more new files associated with the repository are generated at the client device. Using the credential, the one or more new files are obtained at the storage service from the client device. The one or more new files are added to the repository.
    Type: Grant
    Filed: July 1, 2022
    Date of Patent: September 12, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Marvin Michael Theimer, Julien Jacques Ellie, Colin Watson, Ullas Sankhla, Swapandeep Singh, Kerry Hart, Paul Anderson, Brian Dahmen, Suchi Nandini, Yunhan Chen, Shu Liu, Arjun Raman, Yuxin Xie, Fengjia Xiong
  • Patent number: 11640285
    Abstract: Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.
    Type: Grant
    Filed: April 5, 2022
    Date of Patent: May 2, 2023
    Assignee: Blaize, Inc.
    Inventors: Ravi Korsa, Aravind Rajulapudi, Pathikonda Datta Nagraj
  • Patent number: 11640302
    Abstract: A computing device comprising: a plurality of ALUs; a set of registers; a memory; a memory interface between the registers and the memory; a control unit controlling the ALUs by generating: at least one cycle i including both implementing at least one first computing operation by way of an arithmetic logic unit and downloading a first dataset from the memory to at least one register; at least one cycle ii, following the at least one cycle i, including implementing a second computing operation by way of an arithmetic logic unit, for which second computing operation at least part of the first dataset forms at least one operand.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: May 2, 2023
    Assignee: VSORA
    Inventors: Khaled Maalej, Trung-Dung Nguyen, Julien Schmitt, Pierre-Emmanuel Bernard
  • Patent number: 11467832
    Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: October 11, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
  • Patent number: 11442696
    Abstract: A Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard is described with exception handling. Operations including exception handling in a way that does not interfere with execution of data flow operations, overflow detection, zero detection and sign extension are adopted for 2's complement and Carry-Save format.
    Type: Grant
    Filed: November 23, 2021
    Date of Patent: September 13, 2022
    Assignee: SAMBANOVA SYSTEMS, INC.
    Inventors: Vojin G. Oklobdzija, Matthew M. Kim
  • Patent number: 11403097
    Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: August 2, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
  • Patent number: 11360771
    Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: June 14, 2022
    Assignee: Intel Corporation
    Inventors: William M. Brown, Mikhail Plotnikov, Christopher J. Hughes
  • Patent number: 11249755
    Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
  • Patent number: 11222554
    Abstract: A system, method and computer-readable medium format-preserving encryption of a numerical value, including storing a binary numerical value, the binary numerical value comprising a plurality of binary bits, dividing the plurality of binary bits into a plurality of bit groups and storing the plurality of bit groups in a plurality of bytes, encrypting each byte in the plurality of bytes using a radix value corresponding to a quantity of binary bits in a bit group corresponding to that byte to generate a plurality of ciphertext bytes, and combining a quantity of least-significant bits from each ciphertext byte in the plurality of ciphertext bytes to generate a binary ciphertext value, the quantity of least-significant bits combined from each ciphertext byte corresponding to the radix value used to generate that ciphertext byte.
    Type: Grant
    Filed: August 16, 2019
    Date of Patent: January 11, 2022
    Assignee: INFORMATICA LLC
    Inventors: Igor Balabine, Rajagopal Guduru, Ramesh Nallamothu
  • Patent number: 11216280
    Abstract: Exception control circuitry controls exception handling for processing circuitry. In response to an initial exception occurring when the processing circuitry is in a given exception level, the initial exception to be handled in a target exception level, the exception control circuitry stores exception control information to at least one exception control register associated with the target exception level, indicating at least one property of the initial exception or of processor state at a time the initial exception occurred. When at least one exception intercept configuration parameter stored in a configuration register indicates that exception interception is enabled, after storing the exception control information, and before the processing circuitry starts processing an exception handler for handling the initial exception in the target exception level, the exception control circuitry triggers a further exception to be handled in a predetermined exception level.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: January 4, 2022
    Assignee: Arm Limited
    Inventor: Simon John Craske
  • Patent number: 11194578
    Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor, a register file associated with the at least one processor, preferably a condition register that stores status information, the register file having multiple locations for storing data, multiple ports to write data to and read data from the register file. The system or processor includes an execution area, and the processor is configured to read from all the read ports in a first cycle, and to read from all the read ports in a second cycle. In an embodiment, the execution area includes a staging latch to store data from a first cycle read operation, and in an aspect the computer system is configured to combine the data stored in the staging latch during a first read cycle with the data read from the second cycle.
    Type: Grant
    Filed: May 23, 2018
    Date of Patent: December 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Steven J. Battle, Brian D. Barrick, Joshua W. Bowman, Susan E. Eisen, Brandon Goddard, Cliff Kucharski, Dung Q. Nguyen, David S. Walder
  • Patent number: 11188337
    Abstract: Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: November 30, 2021
    Assignees: The Florida State University Research Foundation, Inc., Michigan Technological University
    Inventors: David Whalley, Soner Onder
  • Patent number: 11175890
    Abstract: Examples of techniques for hexadecimal exponent alignment for a binary floating point unit (BFU) of a computer processor are described herein. An aspect includes receiving, by the BFU, a first operand comprising a first fraction and a first exponent, and a second operand comprising a second fraction and a second exponent. Another aspect includes, based on the first operand and the second operand being in a first floating point format, multiplying each of the first exponent and the second exponent by a factor corresponding to a number of bits in a digit in the first floating point format.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: November 16, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kerstin Claudia Schelm, Petra Leber, Nicol Hofmann, Michael Klein
  • Patent number: 11169809
    Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 11138151
    Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program determines a scale value based on a plurality of floating point values. The program further scales the plurality of floating point values based on the scale value. The program also converts the plurality of floating point values to a plurality of integer values. The program further determines an integer encoding scheme from a plurality of integer encoding schemes. The program also encodes the plurality of integer values based on the determined integer encoding scheme.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: October 5, 2021
    Assignee: SAP SE
    Inventor: Martin Rupp
  • Patent number: 11137912
    Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: October 5, 2021
    Assignee: SK hynix Inc.
    Inventors: Sung Kwan Hong, Yeong Sik Yi
  • Patent number: 11126691
    Abstract: An apparatus is provided that receives a scalar start value, an adjust amount and wrapping control information, and includes vector generating circuitry for generating a vector comprising a plurality of elements such that a value of a first element is dependent on the scalar start value, and values of the plurality of elements follow a regularly progressing sequence that is constrained to wrap as required to ensure that each value is within bounds determined from the wrapping control information. The adjust amount is used to determine a difference between values of adjacent elements in the regularly progressing sequence. The vector generating circuitry has first adder circuitry for generating a plurality of first candidate values for the plurality of elements, assuming absence of a wrapping condition, and second adder circuitry for generating a plurality of second candidate values for the plurality of elements, assume presence of a wrapping condition.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 21, 2021
    Assignee: Arm Limited
    Inventor: Jack William Derek Andrew
  • Patent number: 11115015
    Abstract: A circuit component has an address determined from a voltage level applied to a single electrical contact of the circuit component. The circuit component is configured to be assigned one of at least three unique addresses and to select from among the at least three unique addresses based on the voltage level.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: September 7, 2021
    Assignee: SKYWORKS SOLUTIONS, INC.
    Inventors: Bo Zhou, Guillaume Alexandre Blin
  • Patent number: 11080046
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11042370
    Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: June 22, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
  • Patent number: 11042375
    Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.
    Type: Grant
    Filed: August 1, 2017
    Date of Patent: June 22, 2021
    Assignee: ARM Limited
    Inventors: Mbou Eyole, Jesse Garrett Beu, Alejandro Martinez Vicente, Timothy Hayes
  • Patent number: 11036511
    Abstract: An apparatus has a processing pipeline, and first and second register files. A temporary-register-using instruction is supported which controls the pipeline to perform an operation using a temporary variable derived from an operand stored in the first register file. In response to the instruction, when a predetermined condition is not satisfied, the pipeline processes at least one register move micro-operation to transfer data from the at least one source register of the first register file to at least one newly allocated temporary register of the second register file. When the condition is satisfied, the operation can be performed using a temporary variable already stored in the temporary register of the second register file used by an earlier temporary-register-using instruction specifying the same source register for determining the temporary variable, in the absence of an intervening instruction for rewriting the source register.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: June 15, 2021
    Assignee: Arm Limited
    Inventors: Xiaoyang Shen, Damien Robin Martin, Cédric Denis Robert Airaud, Luca Nassi, François Donati
  • Patent number: 10996957
    Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: May 4, 2021
    Assignee: MARVELL ASIA PTE, LTD.
    Inventor: David A. Carlson
  • Patent number: 10970072
    Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Evangelos Georganas, Christopher J. Hughes, Raanan Sade, Robert Valentine
  • Patent number: 10963247
    Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.
    Type: Grant
    Filed: May 24, 2019
    Date of Patent: March 30, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
  • Patent number: 10963246
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 10942879
    Abstract: A first operation identifier is assigned to a current operation directed to a memory component, the first operation identifier having a first entry in a first data structure that associates the first operation identifier with a first buffer identifier. It is determined whether the current operation collides with a prior operation assigned a second operation identifier, the second operation identifier having a second entry in the first data structure that associates the second operation identifier with a second buffer identifier. A latest flag is updated to indicate that the first entry is a latest operation directed to an address (1) in response to determining that the current operation collides with the prior operation and that the current and prior operations are read operations, or (2) in response to determining to determining that the current operation does not collide with a prior operation.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: March 9, 2021
    Assignee: MICRON TECHNOLOGY, INC.
    Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
  • Patent number: 10761728
    Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: September 1, 2020
    Assignee: SK hynix Inc.
    Inventors: Sung Kwan Hong, Yeong Sik Yi
  • Patent number: 10761970
    Abstract: The invention is notably directed to a computer-implemented method for performing safety check operations. The method comprises steps that are implemented while executing a computer program, which is instrumented with safety check operations. As a result, this computer program forms a sequence of ordered instructions. Such instructions comprise safety check operation instructions, in addition to generic execution instructions and system inputs. System inputs allow the executing program to interact with an operating system, which manages resources for the computer program to execute. A series of instructions are identified while executing the computer program. Namely, a first instruction is identified in the sequence, as one of the safety check operation instructions, in view of its subsequent execution. After having identified the first instruction, a second instruction is identified in the sequence. The second instruction is identified as one of the generic computer program instructions.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: September 1, 2020
    Assignee: International Business Machines Corporation
    Inventors: Anil Kurmus, Matthias Neugschwandtner, Alessandro Sorniotti
  • Patent number: 10740067
    Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.
    Type: Grant
    Filed: June 23, 2017
    Date of Patent: August 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10726329
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: July 28, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
  • Patent number: 10705842
    Abstract: Methods and apparatuses relating to high-performance authenticated encryption are described.
    Type: Grant
    Filed: April 2, 2018
    Date of Patent: July 7, 2020
    Assignee: INTEL CORPORATION
    Inventors: Vikram Suresh, Sanu Mathew, Sudhir Satpathy, Vinodh Gopal
  • Patent number: 10672100
    Abstract: An apparatus determines a second pixel range of an uncorrected image necessary to generate a first pixel range having pixels in a preset range of a corrected image, including a cache unit determining the second pixel range and reading and holding the second pixel range from memory before executing correction. Correspondences indicating positions of the uncorrected image corresponding to positions of pixels of the corrected image, respectively, are preset. The cache unit specifies a position of the uncorrected image corresponding to a pixel of one of four corners of a rectangular third pixel range including the first pixel range based on the correspondence, specifies pixel ranges of the uncorrected image necessary for pixel value generation, respectively, at the four corners of the third pixel range based on the specified position, and determines a pixel range including a convex set including the specified pixel ranges as the second pixel range.
    Type: Grant
    Filed: October 12, 2016
    Date of Patent: June 2, 2020
    Assignee: Hitachi Automotive Systems, Ltd.
    Inventors: Yusuke Uchida, Tetsuya Yamada, Shigeru Matsuo, Manabu Sasamoto
  • Patent number: 10592243
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: March 17, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10567249
    Abstract: A method and system are described. The method and system include determining a grouping characteristic for a plurality of nodes and a corresponding plurality of links. The nodes and the links correspond to components of a network and are associated with network performance information. The grouping characteristic includes at least one of partitionability into pages and a hop distance. The method and system also include generating a graphical visualization based on the grouping characteristic, the nodes and the links.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: February 18, 2020
    Assignee: ThousandEyes, Inc.
    Inventors: John Moeses Ercia Bauan, Sunil Bandla, Ricardo V. Oliveira
  • Patent number: 10565283
    Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four consecutive non-negative integers in numerical order. In an aspect, the instruction does not indicate a source packed data operand having a plurality of packed data elements in an architecturally-visible storage location. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 18, 2020
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
  • Patent number: 10521383
    Abstract: A first operation identifier is assigned to a first operation directed to a memory component, the first operation identifier having an entry in a first data structure that associates the first operation identifier with a first plurality of buffer identifiers. It is determined whether the first operation collides with a prior operation assigned a second operation identifier, the second operation identifier having an entry in the first data structure that associates the second operation identifier with a second plurality of buffer identifiers. It is determined whether the first operation is a read or a write operation. In response to determining that the first operation collides with the prior operation and that the first operation is a read operation, the first plurality of buffer identifiers are updated with a buffer identifier included in the second plurality of buffer identifiers.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: December 31, 2019
    Assignee: MICRON TECHNOLOGY, INC.
    Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
  • Patent number: 10467185
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
  • Patent number: 10360036
    Abstract: A computer processing system is provided. The computer processing system includes a processor configured to crack a Move-To-FPSCR instruction into two internal instructions. A first one of the two internal instructions executes out-of-order to update a control field and a second one of the two internal instructions executes in-order to compute a trap decision.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: July 23, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian J. D. Barrick, Maarten J. Boersma, Niels Fricke, Michael J. Genden
  • Patent number: 10339057
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: July 2, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak