Patents by Inventor Mark J. Charney

Mark J. Charney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Vector store/load instructions for array of structures

Patent number: 10019262

Abstract: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.

Type: Grant

Filed: December 22, 2015

Date of Patent: July 10, 2018

Assignee: Intel Corporation

Inventors: Ashish Jha, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney, Milind B. Girkar
Method and apparatus for performing a vector bit reversal

Patent number: 10013253

Abstract: An apparatus and method for performing a vector bit reversal. For example, one embodiment of a processor comprises: a source vector register to store a plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; vector bit reversal logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the source vector register to generate a set of reversed bit groups; and a destination vector register to store the reversed bit groups.

Type: Grant

Filed: December 23, 2014

Date of Patent: July 3, 2018

Assignee: Intel Corporation

Inventors: Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney
Fused multiply-add (FMA) low functional unit

Patent number: 9996320

Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.

Type: Grant

Filed: December 23, 2015

Date of Patent: June 12, 2018

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Floating point (FP) add low instructions functional unit

Patent number: 9996319

Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.

Type: Grant

Filed: December 23, 2015

Date of Patent: June 12, 2018

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
GATHERING AND SCATTERING MULTIPLE DATA ELEMENTS

Publication number: 20180150301

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Application

Filed: May 20, 2013

Publication date: May 31, 2018

Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
Systems, apparatuses, and methods for performing mask bit compression

Patent number: 9983873

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor mask bit compression in response to a single mask bit compression instruction that includes a source writemask register operand, a destination writemask register operand, and an opcode are described.

Type: Grant

Filed: May 31, 2016

Date of Patent: May 29, 2018

Assignee: Intel Corporation

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
GATHERING AND SCATTERING MULTIPLE DATA ELEMENTS

Publication number: 20180129506

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Application

Filed: January 4, 2018

Publication date: May 10, 2018

Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
Apparatus and method of improved permute instructions with multiple granularities

Patent number: 9946540

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Grant

Filed: May 22, 2017

Date of Patent: April 17, 2018

Assignee: INTEL CORPORATION

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS

Publication number: 20180074822

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Application

Filed: November 9, 2017

Publication date: March 15, 2018

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS

Publication number: 20180067742

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Application

Filed: November 9, 2017

Publication date: March 8, 2018

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
Instruction and logic to perform a centrifuge operation

Patent number: 9904548

Abstract: A processing device implements a set of instructions to perform a centrifuge operation using vector or general purpose registers. In one embodiment, the centrifuge operation separates bits in a source register to opposing regions of a destination register based on a control mask, where each source register bit with a corresponding control mask value of one is written to one region in a destination register, while source register bits with a corresponding control mask value of zero are written to an opposing region of the destination register.

Type: Grant

Filed: December 22, 2014

Date of Patent: February 27, 2018

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark J. Charney
Loop vectorization methods and apparatus

Patent number: 9898266

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.

Type: Grant

Filed: January 25, 2016

Date of Patent: February 20, 2018

Assignee: Intel Corporation

Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Methods and apparatus for fusing instructions to provide OR-test and AND-test functionality on multiple test sources

Patent number: 9886277

Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.

Type: Grant

Filed: March 15, 2013

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Maxim Loktyukhin, Robert Valentine, Julian C. Horn, Mark J. Charney
INTERRUPTIBLE AND RESTARTABLE MATRIX MULTIPLICATION INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

Publication number: 20180004510

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Application

Filed: July 2, 2016

Publication date: January 4, 2018

Applicant: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, JR.
ARCHITECTURAL REGISTER REPLACEMENT FOR INSTRUCTIONS THAT USE MULTIPLE ARCHITECTURAL REGISTERS

Publication number: 20180004523

Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.

Type: Application

Filed: July 1, 2016

Publication date: January 4, 2018

Applicant: Intel Corporation

Inventors: Mark J. Charney, Robert Valentine, Milind B. Girkar, Ashish Jha, Bret L. Toll, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal San Adrian, Jason W. Brandt
PACKED DATA OPERATION MASK CONCATENATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20170300327

Abstract: A method of an aspect includes receiving a packed data operation mask concatenation instruction. The packed data operation mask concatenation instruction indicates a first source having a first packed data operation mask, indicates a second source having a second packed data operation mask, and indicates a destination. A result is stored in the destination in response to the packed data operation mask concatenation instruction. The result includes the first packed data operation mask concatenated with the second packed data operation mask. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: February 27, 2017

Publication date: October 19, 2017

Applicant: lntel Corporation

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Andrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
Method and apparatus for performing a vector bit reversal and crossing

Patent number: 9785437

Abstract: An apparatus and method for performing a vector bit reversal and crossing. For example, one embodiment of a processor comprises: a first source vector register to store a first plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; a second source vector to store a second plurality of source bit groups; vector bit reversal and crossing logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the first source vector register to generate a set of reversed bit groups, wherein the vector bit reversal and crossing logic is to additionally interleave the set of reversed bit groups with the second plurality of bit groups; and a destination vector register to store the reversed bit groups interleaved with the first plurality of bit groups.

Type: Grant

Filed: December 23, 2014

Date of Patent: October 10, 2017

Assignee: Intel Corporation

Inventors: Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney
APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS

Publication number: 20170262282

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Application

Filed: May 22, 2017

Publication date: September 14, 2017

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
Packed data operation mask register arithmetic combination processors, methods, systems, and instructions

Patent number: 9760371

Abstract: A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2011

Date of Patent: September 12, 2017

Assignee: Intel Corporation

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
Systems, Apparatuses, and Methods for Strided Loads

Publication number: 20170192781

Abstract: Detailed herein are systems, apparatuses, and methods for strided loads. In an embodiment, an apparatus includes a decoder to decode an instruction, wherein the instruction to include fields a starting source memory address operand and a starting destination register operand; and execution circuitry to execute the decoded instruction to extract data elements of a defined number of types from contiguous memory beginning at the starting source memory address and, for each type, store the extracted data elements in a packed data register dedicated to that type beginning with starting destination register operand.

Type: Application

Filed: December 30, 2015

Publication date: July 6, 2017

Inventors: Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Jason W. Brandt, Mark J. Charney, Ashish Jha, Milind B. Girkar, Bret L. Toll, Evgeny V. Stupachenko, Sergey Y. Ostanevich

prev … 5 6 7 8 9 10 11 12 next