Floating Point Or Vector Patents (Class 712/222)
-
Patent number: 12147804Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.Type: GrantFiled: July 22, 2021Date of Patent: November 19, 2024Assignee: Intel CorporationInventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Stanislav Shwartsman, Dan Baum, Igor Yanover, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Jesus Corbal, Yuri Gebil, Simon Rubanovich
-
Patent number: 12106100Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.Type: GrantFiled: July 1, 2017Date of Patent: October 1, 2024Assignee: Intel CorporationInventors: Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Dan Baum, Zeev Sperber, Jesus Corbal, Bret L. Toll, Raanan Sade, Igor Yanover, Yuri Gebil, Rinat Rappoport, Stanislav Shwartsman, Menachem Adelman, Simon Rubanovich
-
Patent number: 12056788Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.Type: GrantFiled: March 1, 2022Date of Patent: August 6, 2024Assignee: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
-
Patent number: 12001385Abstract: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described.Type: GrantFiled: December 24, 2020Date of Patent: June 4, 2024Assignee: Intel CorporationInventor: Elmoustapha Ould-Ahmed-Vall
-
Patent number: 11960438Abstract: A stacked processor-plus-memory device includes a processing die with an array of processing elements of an artificial neural network. Each processing element multiplies a first operand—e.g. a weight—by a second operand to produce a partial result to a subsequent processing element. To prepare for these computations, a sequencer loads the weights into the processing elements as a sequence of operands that step through the processing elements, each operand stored in the corresponding processing element. The operands can be sequenced directly from memory to the processing elements or can be stored first in cache. The processing elements include streaming logic that disregards interruptions in the stream of operands.Type: GrantFiled: August 24, 2021Date of Patent: April 16, 2024Assignee: Rambus Inc.Inventors: Steven C. Woo, Michael Raymond Miller
-
Patent number: 11900112Abstract: A method to reverse source data in a processor in response to a vector reverse instruction includes specifying, in respective fields of the vector reverse instruction, a source register containing the source data and a destination register. The source register includes a plurality of lanes and each lane contains a data element, and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector reverse instruction by creating reversed source data by reversing the order of the data elements, and storing the reversed source data in the destination register.Type: GrantFiled: March 28, 2022Date of Patent: February 13, 2024Assignee: Texas Instruments IncorporatedInventors: Timothy D. Anderson, Duc Bui
-
Patent number: 11875154Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described.Type: GrantFiled: December 13, 2019Date of Patent: January 16, 2024Assignee: Intel CorporationInventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 11847450Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of zero are described.Type: GrantFiled: December 13, 2019Date of Patent: December 19, 2023Assignee: Intel CorporationInventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 11816482Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.Type: GrantFiled: August 18, 2022Date of Patent: November 14, 2023Assignee: NVIDIA CorporationInventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
-
Patent number: 11816481Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.Type: GrantFiled: August 18, 2022Date of Patent: November 14, 2023Assignee: NVIDIA CorporationInventors: Brent Ralph Boswell, Ming Y. Siu, Jack H. Choquette, Jonah M. Alben, Stuart Oberman
-
Patent number: 11755764Abstract: Methods, systems, and computer-readable media for a client-side filesystem for a remote repository are disclosed. One or more files of a repository are sent from a storage service to a client device. The file(s) are obtained by the client using a credential sent by a repository manager. Local copies of the file(s) are accessible via a local filesystem mounted at the client device. One or more new files associated with the repository are generated at the client device. Using the credential, the one or more new files are obtained at the storage service from the client device. The one or more new files are added to the repository.Type: GrantFiled: July 1, 2022Date of Patent: September 12, 2023Assignee: Amazon Technologies, Inc.Inventors: Marvin Michael Theimer, Julien Jacques Ellie, Colin Watson, Ullas Sankhla, Swapandeep Singh, Kerry Hart, Paul Anderson, Brian Dahmen, Suchi Nandini, Yunhan Chen, Shu Liu, Arjun Raman, Yuxin Xie, Fengjia Xiong
-
Patent number: 11640285Abstract: Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.Type: GrantFiled: April 5, 2022Date of Patent: May 2, 2023Assignee: Blaize, Inc.Inventors: Ravi Korsa, Aravind Rajulapudi, Pathikonda Datta Nagraj
-
Patent number: 11640302Abstract: A computing device comprising: a plurality of ALUs; a set of registers; a memory; a memory interface between the registers and the memory; a control unit controlling the ALUs by generating: at least one cycle i including both implementing at least one first computing operation by way of an arithmetic logic unit and downloading a first dataset from the memory to at least one register; at least one cycle ii, following the at least one cycle i, including implementing a second computing operation by way of an arithmetic logic unit, for which second computing operation at least part of the first dataset forms at least one operand.Type: GrantFiled: May 21, 2019Date of Patent: May 2, 2023Assignee: VSORAInventors: Khaled Maalej, Trung-Dung Nguyen, Julien Schmitt, Pierre-Emmanuel Bernard
-
Patent number: 11467832Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.Type: GrantFiled: March 29, 2021Date of Patent: October 11, 2022Assignee: Texas Instruments IncorporatedInventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
-
Patent number: 11442696Abstract: A Floating point Multiply-Add, Accumulate Unit, supporting BF16 format for Multiply-Accumulate operations, and FP32 Single-Precision Addition complying with the IEEE 754 Standard is described with exception handling. Operations including exception handling in a way that does not interfere with execution of data flow operations, overflow detection, zero detection and sign extension are adopted for 2's complement and Carry-Save format.Type: GrantFiled: November 23, 2021Date of Patent: September 13, 2022Assignee: SAMBANOVA SYSTEMS, INC.Inventors: Vojin G. Oklobdzija, Matthew M. Kim
-
Patent number: 11403097Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.Type: GrantFiled: June 26, 2019Date of Patent: August 2, 2022Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
-
Patent number: 11360771Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.Type: GrantFiled: June 30, 2017Date of Patent: June 14, 2022Assignee: Intel CorporationInventors: William M. Brown, Mikhail Plotnikov, Christopher J. Hughes
-
Patent number: 11249755Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.Type: GrantFiled: September 27, 2017Date of Patent: February 15, 2022Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
-
Patent number: 11222554Abstract: A system, method and computer-readable medium format-preserving encryption of a numerical value, including storing a binary numerical value, the binary numerical value comprising a plurality of binary bits, dividing the plurality of binary bits into a plurality of bit groups and storing the plurality of bit groups in a plurality of bytes, encrypting each byte in the plurality of bytes using a radix value corresponding to a quantity of binary bits in a bit group corresponding to that byte to generate a plurality of ciphertext bytes, and combining a quantity of least-significant bits from each ciphertext byte in the plurality of ciphertext bytes to generate a binary ciphertext value, the quantity of least-significant bits combined from each ciphertext byte corresponding to the radix value used to generate that ciphertext byte.Type: GrantFiled: August 16, 2019Date of Patent: January 11, 2022Assignee: INFORMATICA LLCInventors: Igor Balabine, Rajagopal Guduru, Ramesh Nallamothu
-
Patent number: 11216280Abstract: Exception control circuitry controls exception handling for processing circuitry. In response to an initial exception occurring when the processing circuitry is in a given exception level, the initial exception to be handled in a target exception level, the exception control circuitry stores exception control information to at least one exception control register associated with the target exception level, indicating at least one property of the initial exception or of processor state at a time the initial exception occurred. When at least one exception intercept configuration parameter stored in a configuration register indicates that exception interception is enabled, after storing the exception control information, and before the processing circuitry starts processing an exception handler for handling the initial exception in the target exception level, the exception control circuitry triggers a further exception to be handled in a predetermined exception level.Type: GrantFiled: November 26, 2019Date of Patent: January 4, 2022Assignee: Arm LimitedInventor: Simon John Craske
-
Patent number: 11194578Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor, a register file associated with the at least one processor, preferably a condition register that stores status information, the register file having multiple locations for storing data, multiple ports to write data to and read data from the register file. The system or processor includes an execution area, and the processor is configured to read from all the read ports in a first cycle, and to read from all the read ports in a second cycle. In an embodiment, the execution area includes a staging latch to store data from a first cycle read operation, and in an aspect the computer system is configured to combine the data stored in the staging latch during a first read cycle with the data read from the second cycle.Type: GrantFiled: May 23, 2018Date of Patent: December 7, 2021Assignee: International Business Machines CorporationInventors: Steven J. Battle, Brian D. Barrick, Joshua W. Bowman, Susan E. Eisen, Brandon Goddard, Cliff Kucharski, Dung Q. Nguyen, David S. Walder
-
Patent number: 11188337Abstract: Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.Type: GrantFiled: September 30, 2019Date of Patent: November 30, 2021Assignees: The Florida State University Research Foundation, Inc., Michigan Technological UniversityInventors: David Whalley, Soner Onder
-
Patent number: 11175890Abstract: Examples of techniques for hexadecimal exponent alignment for a binary floating point unit (BFU) of a computer processor are described herein. An aspect includes receiving, by the BFU, a first operand comprising a first fraction and a first exponent, and a second operand comprising a second fraction and a second exponent. Another aspect includes, based on the first operand and the second operand being in a first floating point format, multiplying each of the first exponent and the second exponent by a factor corresponding to a number of bits in a digit in the first floating point format.Type: GrantFiled: April 30, 2019Date of Patent: November 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kerstin Claudia Schelm, Petra Leber, Nicol Hofmann, Michael Klein
-
Patent number: 11169809Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.Type: GrantFiled: March 31, 2017Date of Patent: November 9, 2021Assignee: Intel CorporationInventor: Mikhail Plotnikov
-
Patent number: 11138151Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program determines a scale value based on a plurality of floating point values. The program further scales the plurality of floating point values based on the scale value. The program also converts the plurality of floating point values to a plurality of integer values. The program further determines an integer encoding scheme from a plurality of integer encoding schemes. The program also encodes the plurality of integer values based on the determined integer encoding scheme.Type: GrantFiled: August 30, 2018Date of Patent: October 5, 2021Assignee: SAP SEInventor: Martin Rupp
-
Patent number: 11137912Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.Type: GrantFiled: July 31, 2020Date of Patent: October 5, 2021Assignee: SK hynix Inc.Inventors: Sung Kwan Hong, Yeong Sik Yi
-
Patent number: 11126691Abstract: An apparatus is provided that receives a scalar start value, an adjust amount and wrapping control information, and includes vector generating circuitry for generating a vector comprising a plurality of elements such that a value of a first element is dependent on the scalar start value, and values of the plurality of elements follow a regularly progressing sequence that is constrained to wrap as required to ensure that each value is within bounds determined from the wrapping control information. The adjust amount is used to determine a difference between values of adjacent elements in the regularly progressing sequence. The vector generating circuitry has first adder circuitry for generating a plurality of first candidate values for the plurality of elements, assuming absence of a wrapping condition, and second adder circuitry for generating a plurality of second candidate values for the plurality of elements, assume presence of a wrapping condition.Type: GrantFiled: June 23, 2020Date of Patent: September 21, 2021Assignee: Arm LimitedInventor: Jack William Derek Andrew
-
Patent number: 11115015Abstract: A circuit component has an address determined from a voltage level applied to a single electrical contact of the circuit component. The circuit component is configured to be assigned one of at least three unique addresses and to select from among the at least three unique addresses based on the voltage level.Type: GrantFiled: November 25, 2019Date of Patent: September 7, 2021Assignee: SKYWORKS SOLUTIONS, INC.Inventors: Bo Zhou, Guillaume Alexandre Blin
-
Patent number: 11080046Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.Type: GrantFiled: February 5, 2021Date of Patent: August 3, 2021Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 11042370Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: GrantFiled: April 19, 2018Date of Patent: June 22, 2021Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
-
Patent number: 11042375Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.Type: GrantFiled: August 1, 2017Date of Patent: June 22, 2021Assignee: ARM LimitedInventors: Mbou Eyole, Jesse Garrett Beu, Alejandro Martinez Vicente, Timothy Hayes
-
Patent number: 11036511Abstract: An apparatus has a processing pipeline, and first and second register files. A temporary-register-using instruction is supported which controls the pipeline to perform an operation using a temporary variable derived from an operand stored in the first register file. In response to the instruction, when a predetermined condition is not satisfied, the pipeline processes at least one register move micro-operation to transfer data from the at least one source register of the first register file to at least one newly allocated temporary register of the second register file. When the condition is satisfied, the operation can be performed using a temporary variable already stored in the temporary register of the second register file used by an earlier temporary-register-using instruction specifying the same source register for determining the temporary variable, in the absence of an intervening instruction for rewriting the source register.Type: GrantFiled: July 29, 2019Date of Patent: June 15, 2021Assignee: Arm LimitedInventors: Xiaoyang Shen, Damien Robin Martin, Cédric Denis Robert Airaud, Luca Nassi, François Donati
-
Patent number: 10996957Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.Type: GrantFiled: June 20, 2019Date of Patent: May 4, 2021Assignee: MARVELL ASIA PTE, LTD.Inventor: David A. Carlson
-
Patent number: 10970072Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.Type: GrantFiled: December 21, 2018Date of Patent: April 6, 2021Assignee: Intel CorporationInventors: Alexander F. Heinecke, Evangelos Georganas, Christopher J. Hughes, Raanan Sade, Robert Valentine
-
Patent number: 10963247Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.Type: GrantFiled: May 24, 2019Date of Patent: March 30, 2021Assignee: Texas Instruments IncorporatedInventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
-
Patent number: 10963246Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.Type: GrantFiled: November 9, 2018Date of Patent: March 30, 2021Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 10942879Abstract: A first operation identifier is assigned to a current operation directed to a memory component, the first operation identifier having a first entry in a first data structure that associates the first operation identifier with a first buffer identifier. It is determined whether the current operation collides with a prior operation assigned a second operation identifier, the second operation identifier having a second entry in the first data structure that associates the second operation identifier with a second buffer identifier. A latest flag is updated to indicate that the first entry is a latest operation directed to an address (1) in response to determining that the current operation collides with the prior operation and that the current and prior operations are read operations, or (2) in response to determining to determining that the current operation does not collide with a prior operation.Type: GrantFiled: May 28, 2020Date of Patent: March 9, 2021Assignee: MICRON TECHNOLOGY, INC.Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
-
Patent number: 10761728Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.Type: GrantFiled: December 10, 2018Date of Patent: September 1, 2020Assignee: SK hynix Inc.Inventors: Sung Kwan Hong, Yeong Sik Yi
-
Patent number: 10761970Abstract: The invention is notably directed to a computer-implemented method for performing safety check operations. The method comprises steps that are implemented while executing a computer program, which is instrumented with safety check operations. As a result, this computer program forms a sequence of ordered instructions. Such instructions comprise safety check operation instructions, in addition to generic execution instructions and system inputs. System inputs allow the executing program to interact with an operating system, which manages resources for the computer program to execute. A series of instructions are identified while executing the computer program. Namely, a first instruction is identified in the sequence, as one of the safety check operation instructions, in view of its subsequent execution. After having identified the first instruction, a second instruction is identified in the sequence. The second instruction is identified as one of the generic computer program instructions.Type: GrantFiled: October 20, 2017Date of Patent: September 1, 2020Assignee: International Business Machines CorporationInventors: Anil Kurmus, Matthias Neugschwandtner, Alessandro Sorniotti
-
Patent number: 10740067Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.Type: GrantFiled: June 23, 2017Date of Patent: August 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10726329Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.Type: GrantFiled: April 17, 2018Date of Patent: July 28, 2020Assignee: Cerebras Systems Inc.Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
-
Patent number: 10705842Abstract: Methods and apparatuses relating to high-performance authenticated encryption are described.Type: GrantFiled: April 2, 2018Date of Patent: July 7, 2020Assignee: INTEL CORPORATIONInventors: Vikram Suresh, Sanu Mathew, Sudhir Satpathy, Vinodh Gopal
-
Patent number: 10672100Abstract: An apparatus determines a second pixel range of an uncorrected image necessary to generate a first pixel range having pixels in a preset range of a corrected image, including a cache unit determining the second pixel range and reading and holding the second pixel range from memory before executing correction. Correspondences indicating positions of the uncorrected image corresponding to positions of pixels of the corrected image, respectively, are preset. The cache unit specifies a position of the uncorrected image corresponding to a pixel of one of four corners of a rectangular third pixel range including the first pixel range based on the correspondence, specifies pixel ranges of the uncorrected image necessary for pixel value generation, respectively, at the four corners of the third pixel range based on the specified position, and determines a pixel range including a convex set including the specified pixel ranges as the second pixel range.Type: GrantFiled: October 12, 2016Date of Patent: June 2, 2020Assignee: Hitachi Automotive Systems, Ltd.Inventors: Yusuke Uchida, Tetsuya Yamada, Shigeru Matsuo, Manabu Sasamoto
-
Patent number: 10592243Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.Type: GrantFiled: September 10, 2018Date of Patent: March 17, 2020Assignee: TEXAS INSTRUMENTS INCORPORATEDInventor: Joseph Zbiciak
-
Patent number: 10567249Abstract: A method and system are described. The method and system include determining a grouping characteristic for a plurality of nodes and a corresponding plurality of links. The nodes and the links correspond to components of a network and are associated with network performance information. The grouping characteristic includes at least one of partitionability into pages and a hop distance. The method and system also include generating a graphical visualization based on the grouping characteristic, the nodes and the links.Type: GrantFiled: March 18, 2019Date of Patent: February 18, 2020Assignee: ThousandEyes, Inc.Inventors: John Moeses Ercia Bauan, Sunil Bandla, Ricardo V. Oliveira
-
Patent number: 10565283Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four consecutive non-negative integers in numerical order. In an aspect, the instruction does not indicate a source packed data operand having a plurality of packed data elements in an architecturally-visible storage location. Other methods, apparatus, systems, and instructions are disclosed.Type: GrantFiled: December 22, 2011Date of Patent: February 18, 2020Assignee: Intel CorporationInventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
-
Patent number: 10521383Abstract: A first operation identifier is assigned to a first operation directed to a memory component, the first operation identifier having an entry in a first data structure that associates the first operation identifier with a first plurality of buffer identifiers. It is determined whether the first operation collides with a prior operation assigned a second operation identifier, the second operation identifier having an entry in the first data structure that associates the second operation identifier with a second plurality of buffer identifiers. It is determined whether the first operation is a read or a write operation. In response to determining that the first operation collides with the prior operation and that the first operation is a read operation, the first plurality of buffer identifiers are updated with a buffer identifier included in the second plurality of buffer identifiers.Type: GrantFiled: December 17, 2018Date of Patent: December 31, 2019Assignee: MICRON TECHNOLOGY, INC.Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
-
Patent number: 10467185Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.Type: GrantFiled: April 24, 2017Date of Patent: November 5, 2019Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
-
Patent number: 10360036Abstract: A computer processing system is provided. The computer processing system includes a processor configured to crack a Move-To-FPSCR instruction into two internal instructions. A first one of the two internal instructions executes out-of-order to update a control field and a second one of the two internal instructions executes in-order to compute a trap decision.Type: GrantFiled: July 12, 2017Date of Patent: July 23, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brian J. D. Barrick, Maarten J. Boersma, Niels Fricke, Michael J. Genden
-
Patent number: 10339057Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.Type: GrantFiled: December 20, 2016Date of Patent: July 2, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventor: Joseph Zbiciak