Patents by Inventor Tal Uliel
Tal Uliel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10990401Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: GrantFiled: April 1, 2020Date of Patent: April 27, 2021Assignee: Apple Inc.Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10970078Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to provide a compact result in a desired order.Type: GrantFiled: April 5, 2018Date of Patent: April 6, 2021Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
-
Patent number: 10877754Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: GrantFiled: March 13, 2020Date of Patent: December 29, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20200272464Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: ApplicationFiled: March 13, 2020Publication date: August 27, 2020Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20200241876Abstract: In an embodiment, a processor (e.g. a CPU) may offload transcendental computation to a computation engine that may efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.Type: ApplicationFiled: April 13, 2020Publication date: July 30, 2020Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20200225958Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: ApplicationFiled: April 1, 2020Publication date: July 16, 2020Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD
-
Patent number: 10642620Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: GrantFiled: April 5, 2018Date of Patent: May 5, 2020Assignee: Apple Inc.Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10592239Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: GrantFiled: May 28, 2019Date of Patent: March 17, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10496411Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.Type: GrantFiled: December 20, 2017Date of Patent: December 3, 2019Assignee: Intel CorporationInventors: Tal Uliel, Robert Valentine
-
Patent number: 10474463Abstract: An apparatus and method are described for down-converting from a source operand to a destination operand with masking. For example, a method according to one embodiment includes the following operations: reading a source operand value to be down-converted from a first value to a down-converted value and stored in a destination location; reading each mask register bit stored in a mask register, the mask register bit(s) indicating whether to perform a masking operation or a conversion operation on the source operand value; if the mask register bit(s) indicates that a masking operation is to be performed, then performing a specified masking operation and storing the results of the masking operation in the destination location; and if the mask register bit indicates that a masking operation is not to be performed, then down-converting the source operand value and storing the down-converted value in the specified destination location.Type: GrantFiled: December 23, 2011Date of Patent: November 12, 2019Assignee: INTEL CORPORATIONInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Tal Uliel, Jesus Corbal, Zeev Sperber, Amit Gradstein
-
Patent number: 10459877Abstract: Instructions and logic provide vector compress and rotate functionality. A processor may include a mask register, a decoder, and an execution unit. The mask register may include a data field, wherein the data field corresponds to an element location in a vector. The decoder may be coupled to the mask register. The decoder may decode an instruction to obtain a decoded instruction. The decoded instruction may specify a vector source, the mask register, a vector destination, and a vector destination offset location. The execution unit is coupled to the decoder. The execution unit may read an unmasked value in the data field; copy an vector element from the vector source to a location adjacent to the element; change the unmasked value to a masked value; determine that the vector destination is full; store a vector destination operand associated with the vector destination in a memory; and re-execute the instruction using the masked value and the vector destination offset location.Type: GrantFiled: March 17, 2017Date of Patent: October 29, 2019Assignee: Intel CorporationInventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
-
Publication number: 20190310854Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.Type: ApplicationFiled: April 5, 2018Publication date: October 10, 2019Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
-
Publication number: 20190310855Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: ApplicationFiled: April 5, 2018Publication date: October 10, 2019Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20190294441Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: ApplicationFiled: May 28, 2019Publication date: September 26, 2019Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20190250917Abstract: In an embodiment, a computation engine may offload a processor (e.g. a CPU) and efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.Type: ApplicationFiled: February 14, 2018Publication date: August 15, 2019Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10346163Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: GrantFiled: November 1, 2017Date of Patent: July 9, 2019Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20190129719Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: ApplicationFiled: November 1, 2017Publication date: May 2, 2019Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20180217841Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.Type: ApplicationFiled: December 20, 2017Publication date: August 2, 2018Inventors: TAL ULIEL, ROBERT VALENTINE
-
Publication number: 20180121199Abstract: In an embodiment, a processor may implement a fused multiply-add (FMA) instruction that accepts vector operands having vector elements with a first precision, and performing both the multiply and add operations at a higher precision. The add portion of the operation may add adjacent pairs of multiplication results from the multiply portion of the operation, which may allow the result to be stored in a vector register of the same overall length as the input vector registers but with fewer, higher precision vector elements, in an embodiment. Additionally, the overall operation may have high accuracy because of the higher precision throughout the operation.Type: ApplicationFiled: June 21, 2017Publication date: May 3, 2018Inventors: Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Eric Bainville
-
Patent number: 9886242Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner and a store instruction to store the shuffled elements of the first array in the second array.Type: GrantFiled: February 6, 2015Date of Patent: February 6, 2018Assignee: Intel CorporationInventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll