Patents by Inventor Jesus Corbal

Jesus Corbal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Optimized compute hardware for machine learning operations

Patent number: 11334796

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

Type: Grant

Filed: August 3, 2020

Date of Patent: May 17, 2022

Assignee: Intel Corporation

Inventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF

Publication number: 20220129274

Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

Type: Application

Filed: November 11, 2021

Publication date: April 28, 2022

Applicant: Intel Corporation

Inventors: Robert C. VALENTINE, Jesus Corbal SAN ADRIAN, Roger Espasa SANS, Robert D. CAVIN, Bret L. TOLL, Santiago Galan DURAN, Jeffrey G. WIEDEMEIER, Sridhar SAMUDRALA, Milind Baburao GIRKAR, Edward Thomas GROCHOWSKI, Jonathan Cannon HALL, Dennis R. BRADFORD, Elmoustapha OULD-AHMED-VALL, James C ABEL, Mark CHARNEY, Seth ABRAHAM, Suleyman SAIR, Andrew Thomas FORSYTH, Lisa WU, Charles YOUNT
APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION

Publication number: 20220129264

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Type: Application

Filed: November 2, 2021

Publication date: April 28, 2022

Applicant: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
APPARATUS AND METHOD FOR VECTOR MULTIPLY AND SUBTRACTION OF SIGNED DOUBLEWORDS

Publication number: 20220129273

Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one exemplary processor comprises three registers and execution circuitry. The execution circuitry is to multiply first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary products. It is also to select first, second, third, and fourth signed doubleword data elements. It is also to combine the first temporary products with a first packed signed quadword value read from the third register to generate a first accumulated result and to combine the second temporary product with a second packed signed quadword value read from the third source register to generate a second accumulated result. The third register is to store the results.

Type: Application

Filed: November 3, 2021

Publication date: April 28, 2022

Inventors: ElMoustapha OULD-AHMED-VALL, Robert VALENTINE, Mark CHARNEY, Jesus CORBAL, Venkateswara MADDURI
Instruction execution that broadcasts and masks data values at different levels of granularity

Patent number: 11301581

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Grant

Filed: December 30, 2019

Date of Patent: April 12, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney
Instruction execution that broadcasts and masks data values at different levels of granularity

Patent number: 11301580

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Grant

Filed: December 30, 2019

Date of Patent: April 12, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney
Apparatuses and methods for a processor architecture

Patent number: 11294809

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Grant

Filed: August 28, 2018

Date of Patent: April 5, 2022

Assignee: Intel Corporation

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
APPARATUSES, METHODS, AND SYSTEMS FOR A CONFIGURABLE ACCELERATOR HAVING DATAFLOW EXECUTION CIRCUITS

Publication number: 20220100680

Abstract: Systems, methods, and apparatuses relating to a configurable accelerator having dataflow execution circuits are described.

Type: Application

Filed: September 26, 2020

Publication date: March 31, 2022

Inventors: GEORGE CHRYSOS, BHARGAVI NARAYANASETTY, JESUS CORBAL, CHING-KAI LIANG, CHINMAY ASHOK, FRANCIS TSENG
Systems, methods, and apparatus for matrix move

Patent number: 11288068

Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.

Type: Grant

Filed: July 1, 2017

Date of Patent: March 29, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Dan Baum, Alexander Heinecke, Elmoustapha Ould-Ahmed-Vall
Systems, methods, and apparatuses for tile store

Patent number: 11288069

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory.

Type: Grant

Filed: July 1, 2017

Date of Patent: March 29, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil
SYSTEMS AND METHODS TO LOAD A TILE REGISTER PAIR

Publication number: 20220091848

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

Type: Application

Filed: August 10, 2021

Publication date: March 24, 2022

Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
Apparatus and method of improved insert instructions

Patent number: 11275583

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.

Type: Grant

Filed: August 3, 2017

Date of Patent: March 15, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
Systems, methods, and apparatuses for tile broadcast

Patent number: 11263008

Abstract: Embodiments detailed herein relate to matrix operations. In particular, embodiment of broadcasting elements are described. For example, some embodiments describe broadcasting a scalar to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a row to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a column to all configured data element positons of a destination matrix (tile).

Type: Grant

Filed: July 1, 2017

Date of Patent: March 1, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Alexander Heinecke, Barukh Ziv, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Stanislav Shwartsman
SYSTEMS, METHODS, AND APPARATUSES FOR DOT PRODUCTION OPERATIONS

Publication number: 20220058021

Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.

Type: Application

Filed: November 1, 2021

Publication date: February 24, 2022

Applicant: Intel Corporation

Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Menachem ADELMAN, Barukh ZIV, Alexander HEINECKE, Simon RUBANOVICH
Apparatus and method for complex by complex conjugate multiplication

Patent number: 11256504

Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction.

Type: Grant

Filed: September 29, 2017

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
Systems, Apparatuses, And Methods For Fused Multiply Add

Publication number: 20220050678

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand.

Type: Application

Filed: September 3, 2021

Publication date: February 17, 2022

Inventors: Robert Valentine, Galina Ryvchin, Piotr Majcher, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Milind B. Girkar, Zeev Sperber, Simon Rubanovich, Amit Gradstein
Instruction execution that broadcasts and masks data values at different levels of granularity

Patent number: 11250154

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Grant

Filed: December 30, 2019

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney
Vector instructions for selecting and extending an unsigned sum of products of words and doublewords for accumulation

Patent number: 11249755

Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.

Type: Grant

Filed: September 27, 2017

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
SYSTEMS, METHODS, AND APPARATUS FOR TILE CONFIGURATION

Publication number: 20220043652

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Type: Application

Filed: June 28, 2021

Publication date: February 10, 2022

Inventors: Menachem ADELMAN, Robert VALENTINE, Zeev SPERBER, Mark J. CHARNEY, Bret L. TOLL, Rinat RAPPOPORT, Jesus CORBAL, Dan BAUM, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Yuri GEBIL, Raanan SADE
Apparatus and method for scaling pre-scaled results of complex multiply-accumulate operations on packed real and imaginary data elements

Patent number: 11243765

Abstract: Apparatus and method to transform complex data including a processor that comprises: multiplier circuitry to multiply packed complex N-bit data elements with packed complex M-bit data elements to generate at least four real products; adder circuitry to subtract a first real product from a second real product to generate a first temporary result, subtract a third real product from a fourth real product to generate a second temporary result, add the first temporary result to a first packed N-bit data element to generate a first pre-scaled result, subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, add the second temporary result to a second packed N-bit data element to generate a third pre-scaled result, and subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; and scaling circuitry to scale the pre-scaled results.

Type: Grant

Filed: September 29, 2017

Date of Patent: February 8, 2022

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Mark Charney, Robert Valentine, Jesus Corbal, Binwei Yang

prev 1 2 3 4 5 6 7 8 … next