Patents by Inventor Tal Uliel

Tal Uliel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Computation engine with strided dot product

Patent number: 10990401

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Grant

Filed: April 1, 2020

Date of Patent: April 27, 2021

Assignee: Apple Inc.

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Computation engine with upsize/interleave and downsize/deinterleave options

Patent number: 10970078

Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to provide a compact result in a desired order.

Type: Grant

Filed: April 5, 2018

Date of Patent: April 6, 2021

Assignee: Apple Inc.

Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
Matrix computation engine

Patent number: 10877754

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Grant

Filed: March 13, 2020

Date of Patent: December 29, 2020

Assignee: Apple Inc.

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
Matrix Computation Engine

Publication number: 20200272464

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Application

Filed: March 13, 2020

Publication date: August 27, 2020

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
Range Mapping of Input Operands for Transcendental Functions

Publication number: 20200241876

Abstract: In an embodiment, a processor (e.g. a CPU) may offload transcendental computation to a computation engine that may efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.

Type: Application

Filed: April 13, 2020

Publication date: July 30, 2020

Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Computation Engine with Strided Dot Product

Publication number: 20200225958

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Application

Filed: April 1, 2020

Publication date: July 16, 2020

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD
Computation engine with strided dot product

Patent number: 10642620

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Grant

Filed: April 5, 2018

Date of Patent: May 5, 2020

Assignee: Apple Inc.

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Matrix computation engine

Patent number: 10592239

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Grant

Filed: May 28, 2019

Date of Patent: March 17, 2020

Assignee: Apple Inc.

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
Functional unit for instruction execution pipeline capable of shifting different chunks of a packed data operand by different amounts

Patent number: 10496411

Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.

Type: Grant

Filed: December 20, 2017

Date of Patent: December 3, 2019

Assignee: Intel Corporation

Inventors: Tal Uliel, Robert Valentine
Apparatus and method for down conversion of data types

Patent number: 10474463

Abstract: An apparatus and method are described for down-converting from a source operand to a destination operand with masking. For example, a method according to one embodiment includes the following operations: reading a source operand value to be down-converted from a first value to a down-converted value and stored in a destination location; reading each mask register bit stored in a mask register, the mask register bit(s) indicating whether to perform a masking operation or a conversion operation on the source operand value; if the mask register bit(s) indicates that a masking operation is to be performed, then performing a specified masking operation and storing the results of the masking operation in the destination location; and if the mask register bit indicates that a masking operation is not to be performed, then down-converting the source operand value and storing the down-converted value in the specified destination location.

Type: Grant

Filed: December 23, 2011

Date of Patent: November 12, 2019

Assignee: INTEL CORPORATION

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Tal Uliel, Jesus Corbal, Zeev Sperber, Amit Gradstein
Instruction and logic to provide vector compress and rotate functionality

Patent number: 10459877

Abstract: Instructions and logic provide vector compress and rotate functionality. A processor may include a mask register, a decoder, and an execution unit. The mask register may include a data field, wherein the data field corresponds to an element location in a vector. The decoder may be coupled to the mask register. The decoder may decode an instruction to obtain a decoded instruction. The decoded instruction may specify a vector source, the mask register, a vector destination, and a vector destination offset location. The execution unit is coupled to the decoder. The execution unit may read an unmasked value in the data field; copy an vector element from the vector source to a location adjacent to the element; change the unmasked value to a masked value; determine that the vector destination is full; store a vector destination operand associated with the vector destination in a memory; and re-execute the instruction using the masked value and the vector destination offset location.

Type: Grant

Filed: March 17, 2017

Date of Patent: October 29, 2019

Assignee: Intel Corporation

Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
Computation Engine with Upsize/Interleave and Downsize/Deinterleave Options

Publication number: 20190310854

Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.

Type: Application

Filed: April 5, 2018

Publication date: October 10, 2019

Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
Computation Engine with Strided Dot Product

Publication number: 20190310855

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Application

Filed: April 5, 2018

Publication date: October 10, 2019

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Matrix Computation Engine

Publication number: 20190294441

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Application

Filed: May 28, 2019

Publication date: September 26, 2019

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
Range Mapping of Input Operands for Transcendental Functions

Publication number: 20190250917

Abstract: In an embodiment, a computation engine may offload a processor (e.g. a CPU) and efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.

Type: Application

Filed: February 14, 2018

Publication date: August 15, 2019

Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Matrix computation engine

Patent number: 10346163

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Grant

Filed: November 1, 2017

Date of Patent: July 9, 2019

Assignee: Apple Inc.

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
Matrix Computation Engine

Publication number: 20190129719

Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.

Type: Application

Filed: November 1, 2017

Publication date: May 2, 2019

Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
FUNCTIONAL UNIT FOR INSTRUCTION EXECUTION PIPELINE CAPABLE OF SHIFTING DIFFERENT CHUNKS OF A PACKED DATA OPERAND BY DIFFERENT AMOUNTS

Publication number: 20180217841

Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.

Type: Application

Filed: December 20, 2017

Publication date: August 2, 2018

Inventors: TAL ULIEL, ROBERT VALENTINE
Fused Multiply-Add that Accepts Sources at a First Precision and Generates Results at a Second Precision

Publication number: 20180121199

Abstract: In an embodiment, a processor may implement a fused multiply-add (FMA) instruction that accepts vector operands having vector elements with a first precision, and performing both the multiply and add operations at a higher precision. The add portion of the operation may add adjacent pairs of multiplication results from the multiply portion of the operation, which may allow the result to be stored in a vector register of the same overall length as the input vector registers but with fewer, higher precision vector elements, in an embodiment. Additionally, the overall operation may have high accuracy because of the higher precision throughout the operation.

Type: Application

Filed: June 21, 2017

Publication date: May 3, 2018

Inventors: Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Eric Bainville
Methods to optimize a program loop via vector instructions using a shuffle table

Patent number: 9886242

Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner and a store instruction to store the shuffled elements of the first array in the second array.

Type: Grant

Filed: February 6, 2015

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll

1 2 3 next