Patents by Inventor Jesus Corbal

Jesus Corbal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INSTRUCTIONS FOR VECTOR UNSIGNED MULTIPLICATION AND ACCUMULATION OF UNSIGNED DOUBLEWORDS

Publication number: 20200201633

Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.

Type: Application

Filed: September 27, 2017

Publication date: June 25, 2020

Applicant: Intel Corporation

Inventors: Venkateswara R. MADDURI, Carl MURRAY, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Robert VALENTINE, Jesus CORBAL
Method and apparatus for performing a vector bit reversal and crossing

Patent number: 10691452

Abstract: An apparatus and method for performing a vector bit reversal and crossing. For example, one embodiment of a processor comprises: a first source vector register to store a first plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; a second source vector to store a second plurality of source bit groups; vector bit reversal and crossing logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the first source vector register to generate a set of reversed bit groups, wherein the vector bit reversal and crossing logic is to additionally interleave the set of reversed bit groups with the second plurality of bit groups; and a destination vector register to store the reversed bit groups interleaved with the first plurality of bit groups.

Type: Grant

Filed: October 10, 2017

Date of Patent: June 23, 2020

Assignee: Intel Corporation

Inventors: Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney
APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION

Publication number: 20200192663

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Type: Application

Filed: October 18, 2019

Publication date: June 18, 2020

Applicant: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
Apparatus and method for converting a floating-point value from half precision to single precision

Patent number: 10684854

Abstract: An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.

Type: Grant

Filed: November 28, 2017

Date of Patent: June 16, 2020

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal
PACKED DATA OPERATION MASK SHIFT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20200183688

Abstract: A method of an aspect includes receiving a packed data operation mask shift instruction. The packed data operation mask shift instruction indicates a source having a packed data operation mask, indicates a shift count number of bits, and indicates a destination. The method further includes storing a result in the destination in response to the packed data operation mask shift instruction. The result includes a sequence of bits of the packed data operation mask that have been shifted by the shift count number of bits. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: February 11, 2020

Publication date: June 11, 2020

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Andrian, Elmoustapha Ould-Ahmed Vall, Mark J. Charney
APPARATUS AND METHOD FOR PERFORMING DUAL SIGNED AND UNSIGNED MULTIPLICATION OF PACKED DATA ELEMENTS

Publication number: 20200174788

Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.

Type: Application

Filed: November 1, 2019

Publication date: June 4, 2020

Applicant: Intel Corporation

Inventors: VENKATESWARA MADDURI, ELMOUSTAPHA OULD-AHMED-VALL, MARK CHARNEY, ROBERT VALENTINE, JESUS CORBAL, BINWEI YANG
Apparatus and method for vector multiply and accumulate of unsigned doublewords

Patent number: 10664270

Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword.

Type: Grant

Filed: December 21, 2017

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark Charney, Jesus Corbal, Venkateswara Madduri
Systems and methods for implementing chained tile operations

Patent number: 10664287

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

Type: Grant

Filed: March 30, 2018

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Apparatus and method for processing reciprocal square root operations

Patent number: 10664237

Abstract: An apparatus and method for performing a reciprocal square root. For example one embodiment of a processor comprises: a decoder to decode a reciprocal square root instruction to generate a decoded reciprocal square root instruction; a source register to store at least one packed input data element; a destination register to store a result data element; and reciprocal square root execution circuitry to execute the decoded reciprocal square root instruction, the reciprocal square root execution circuitry to use a first portion of the packed input data element as an index to a data structure containing a plurality of sets of coefficients to identify a first set of coefficients from the plurality of sets, the reciprocal square root execution circuitry to generate a reciprocal square root of the packed input data element using a combination of the coefficients and a second portion of the packed input data element.

Type: Grant

Filed: December 21, 2017

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Cristina Anderson, Elmoustapha Ould-Ahmed-Vall, Marius Cornea-Hasegan, Robert Valentine, Mark Charney, Jesus Corbal, Venkateswara Madduri
Systems, apparatuses and methods for dual complex by complex conjugate multiply of signed words

Patent number: 10664277

Abstract: Embodiments of systems, apparatuses, and methods for dual complex number by complex conjugate multiplication in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from a plurality of packed data element positions in the first and second packed data source operands to at least one multiplier circuit, the first and second packed data source operands including a plurality of pairs complex numbers, each pair of complex numbers including data values at shared packed data element positions in the first and second packed data source operands; calculate a real part and an imaginary part of a product of a first complex number and a complex conjugate of a second complex number; and store the real result to a first packed data element position in the destination operand and store the imaginary result to a second packed data element position in the destination operand.

Type: Grant

Filed: September 29, 2017

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark Charney
Fixed point to floating point conversion

Patent number: 10656942

Abstract: Embodiments of instructions and methods of execution of said instructions and resources to execute said instructions are detailed. For example, in an embodiment, a processor comprising: decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a data element from a least significant packed data element position of the identified packed data source operand from a fixed-point representation to a floating point representation, store the floating point representation into a 32-bit least significant packed data element position of the identified packed data destination operand, and zero all remaining packed data elements of the identified packed data destination operand is described.

Type: Grant

Filed: March 4, 2019

Date of Patent: May 19, 2020

Assignee: Intel Corporation

Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark Charney
Method and apparatus for efficient matrix transpose

Patent number: 10649772

Abstract: Disclosed embodiments relate to a method and apparatus for efficient matrix transpose. In one example, a processor to execute a matrix transpose instruction includes fetch circuitry to fetch the matrix transpose instruction specifying a destination matrix and a source matrix having (N×M) elements and (M×N) elements, respectively, a (N×M) load buffer, decode circuitry to decode the fetched matrix transpose instruction, and execution circuitry, responsive to the decoded matrix transpose instruction to, for each row X of M rows of the specified source matrix: fetch and buffer N elements of the row in a load register, and cause the N buffered elements to be written, in the same relative order as in the row, to column X of M columns of the load buffer, and the execution circuitry subsequently to write each of N rows of the load buffer to a same row of the load buffer.

Type: Grant

Filed: March 30, 2018

Date of Patent: May 12, 2020

Assignee: Intel Corporation

Inventors: Dennis Ryan Bradford, Jesus Corbal, Brian Hickmann, Rohan Sharma
INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY

Publication number: 20200134226

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Application

Filed: December 30, 2019

Publication date: April 30, 2020

Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY
INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY

Publication number: 20200134224

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Application

Filed: December 30, 2019

Publication date: April 30, 2020

Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY
INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY

Publication number: 20200134225

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.

Type: Application

Filed: December 30, 2019

Publication date: April 30, 2020

Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY
METHOD AND APPARATUS FOR PERFORMING A VECTOR PERMUTE WITH AN INDEX AND AN IMMEDIATE

Publication number: 20200097290

Abstract: An apparatus and method for performing a vector permute.

Type: Application

Filed: September 4, 2019

Publication date: March 26, 2020

Inventors: JESUS CORBAL SAN ADRIAN, ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, MARK J. CHARNEY, MILIND B. GIRKAR, BRET L. TOLL, ROGER ESPASA, GUILLEM SOLE, JAIRO BALART, BRIAN HICKMANN
DATA ELEMENT COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20200089494

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

Type: Application

Filed: September 23, 2019

Publication date: March 19, 2020

Inventors: Asit K. MISHRA, Edward T. GROCHOWSKI, Jonathan D. PEARCE, Deborah T. MARR, Ehud COHEN, Elmoustapha OULD-AHMED-VALL, Jesus Corbal SAN ADRIAN, Robert VALENTINE, Mark J. CHARNEY, Christopher J. HUGHES, Milind B. GIRKAR
SYSTEMS, APPARATUSES, AND METHODS FOR CONTROLLABLE SINE AND/OR COSINE OPERATIONS

Publication number: 20200073658

Abstract: Embodiments of systems, apparatuses, and methods for performing controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.

Type: Application

Filed: June 30, 2017

Publication date: March 5, 2020

Inventors: Venkateswara R. MADDURI, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Jesus CORBAL, Mark J. CHARNEY, Carl MURRAY, Milind GIRKAR, Bret TOLL
SYSTEMS, APPARATUSES, AND METHODS FOR VECTOR-PACKED FRACTIONAL MULTIPLICATION OF SIGNED WORDS WITH ROUNDING, SATURATION, AND HIGH-RESULT SELECTION

Publication number: 20200073635

Abstract: Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.

Type: Application

Filed: June 29, 2017

Publication date: March 5, 2020

Inventors: Venkateswara R. MADDURI, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Jesus CORBAL, Mark J. CHARNEY, Carl MURRAY, Milind GIRKAR, Bret TOLL
SYSTEMS, METHODS, AND APPARATUSES FOR MATRIX OPERATIONS

Publication number: 20200065352

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Type: Application

Filed: July 1, 2017

Publication date: February 27, 2020

Applicant: Intel Corporation

Inventors: Robert VALENTINE, Mark J. CHARNEY, Elmoustapha OULD-AHMED-VALL, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Bret L. TOLL, Raanan SADE, Igor YANOVER, Yuri GEBIL, Rinat RAPPOPORT, Stanislav SHWARTSMAN, Menachem ADELMAN, Simon RUBANOVICH

prev … 4 5 6 7 8 9 10 11 12 … next