Patents by Inventor Tal Uliel

Tal Uliel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10990401
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: April 27, 2021
    Assignee: Apple Inc.
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10970078
    Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to provide a compact result in a desired order.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: April 6, 2021
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
  • Patent number: 10877754
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: December 29, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20200272464
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Application
    Filed: March 13, 2020
    Publication date: August 27, 2020
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20200241876
    Abstract: In an embodiment, a processor (e.g. a CPU) may offload transcendental computation to a computation engine that may efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.
    Type: Application
    Filed: April 13, 2020
    Publication date: July 30, 2020
    Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20200225958
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Application
    Filed: April 1, 2020
    Publication date: July 16, 2020
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD
  • Patent number: 10642620
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: May 5, 2020
    Assignee: Apple Inc.
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10592239
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Grant
    Filed: May 28, 2019
    Date of Patent: March 17, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10496411
    Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.
    Type: Grant
    Filed: December 20, 2017
    Date of Patent: December 3, 2019
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Robert Valentine
  • Patent number: 10474463
    Abstract: An apparatus and method are described for down-converting from a source operand to a destination operand with masking. For example, a method according to one embodiment includes the following operations: reading a source operand value to be down-converted from a first value to a down-converted value and stored in a destination location; reading each mask register bit stored in a mask register, the mask register bit(s) indicating whether to perform a masking operation or a conversion operation on the source operand value; if the mask register bit(s) indicates that a masking operation is to be performed, then performing a specified masking operation and storing the results of the masking operation in the destination location; and if the mask register bit indicates that a masking operation is not to be performed, then down-converting the source operand value and storing the down-converted value in the specified destination location.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: November 12, 2019
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Tal Uliel, Jesus Corbal, Zeev Sperber, Amit Gradstein
  • Patent number: 10459877
    Abstract: Instructions and logic provide vector compress and rotate functionality. A processor may include a mask register, a decoder, and an execution unit. The mask register may include a data field, wherein the data field corresponds to an element location in a vector. The decoder may be coupled to the mask register. The decoder may decode an instruction to obtain a decoded instruction. The decoded instruction may specify a vector source, the mask register, a vector destination, and a vector destination offset location. The execution unit is coupled to the decoder. The execution unit may read an unmasked value in the data field; copy an vector element from the vector source to a location adjacent to the element; change the unmasked value to a masked value; determine that the vector destination is full; store a vector destination operand associated with the vector destination in a memory; and re-execute the instruction using the masked value and the vector destination offset location.
    Type: Grant
    Filed: March 17, 2017
    Date of Patent: October 29, 2019
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
  • Publication number: 20190310854
    Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.
    Type: Application
    Filed: April 5, 2018
    Publication date: October 10, 2019
    Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
  • Publication number: 20190310855
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Application
    Filed: April 5, 2018
    Publication date: October 10, 2019
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20190294441
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Application
    Filed: May 28, 2019
    Publication date: September 26, 2019
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20190250917
    Abstract: In an embodiment, a computation engine may offload a processor (e.g. a CPU) and efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.
    Type: Application
    Filed: February 14, 2018
    Publication date: August 15, 2019
    Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10346163
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Grant
    Filed: November 1, 2017
    Date of Patent: July 9, 2019
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20190129719
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Application
    Filed: November 1, 2017
    Publication date: May 2, 2019
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20180217841
    Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.
    Type: Application
    Filed: December 20, 2017
    Publication date: August 2, 2018
    Inventors: TAL ULIEL, ROBERT VALENTINE
  • Publication number: 20180121199
    Abstract: In an embodiment, a processor may implement a fused multiply-add (FMA) instruction that accepts vector operands having vector elements with a first precision, and performing both the multiply and add operations at a higher precision. The add portion of the operation may add adjacent pairs of multiplication results from the multiply portion of the operation, which may allow the result to be stored in a vector register of the same overall length as the input vector registers but with fewer, higher precision vector elements, in an embodiment. Additionally, the overall operation may have high accuracy because of the higher precision throughout the operation.
    Type: Application
    Filed: June 21, 2017
    Publication date: May 3, 2018
    Inventors: Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Eric Bainville
  • Patent number: 9886242
    Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner and a store instruction to store the shuffled elements of the first array in the second array.
    Type: Grant
    Filed: February 6, 2015
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll