Patents by Inventor Jeffry E. Gonion

Jeffry E. Gonion has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10831488
    Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: November 10, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III, Andrew J. Beaumont-Smith
  • Publication number: 20200348934
    Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.
    Type: Application
    Filed: July 14, 2020
    Publication date: November 5, 2020
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD, Gerard R. Williams, III
  • Patent number: 10769065
    Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
    Type: Grant
    Filed: June 10, 2019
    Date of Patent: September 8, 2020
    Assignee: Apple Inc.
    Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee
  • Publication number: 20200272464
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Application
    Filed: March 13, 2020
    Publication date: August 27, 2020
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10754649
    Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: August 25, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III
  • Publication number: 20200241876
    Abstract: In an embodiment, a processor (e.g. a CPU) may offload transcendental computation to a computation engine that may efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.
    Type: Application
    Filed: April 13, 2020
    Publication date: July 30, 2020
    Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20200225958
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Application
    Filed: April 1, 2020
    Publication date: July 16, 2020
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD
  • Publication number: 20200192672
    Abstract: A system and method for efficiently protecting branch prediction information. In various embodiments, a computing system includes at least one processor with a branch predictor storing branch target addresses and security tags in a table. The security tag includes one or more components of machine context. When the branch predictor receives a portion of a first program counter of a first branch instruction, and hits on a first table entry during an access, the branch predictor reads out a first security tag. The branch predictor compares one or more components of machine context of the first security tag to one or more components of machine context of the first branch instruction. When there is at least one mismatch, the branch prediction information of the first table entry is not used. Additionally, there is no updating of any branch prediction training information of the first table entry.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Inventors: Jeffry E. Gonion, Ian D. Kountanis, Conrado Blasco, Steven Andrew Myers, Yannick L. Sierra
  • Publication number: 20200192673
    Abstract: Techniques are disclosed relating to protecting branch prediction information. In various embodiments, an integrated circuit includes branch prediction logic having a table that maintains a plurality of entries storing encrypted target address information for branch instructions. The branch prediction logic is configured to receive machine context information for a branch instruction having a target address being predicted by the branch prediction logic, the machine context information including a program counter associated with the branch instruction. The branch prediction logic is configured to use the machine context information to decrypt encrypted target address information stored in one of the plurality of entries identified based on the program counter.
    Type: Application
    Filed: October 25, 2019
    Publication date: June 18, 2020
    Inventors: Steven A. Myers, Jeffry E. Gonion, Yannick L. Sierra, Thomas Icart
  • Patent number: 10642620
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: May 5, 2020
    Assignee: Apple Inc.
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10592239
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Grant
    Filed: May 28, 2019
    Date of Patent: March 17, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20200034145
    Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.
    Type: Application
    Filed: July 24, 2018
    Publication date: January 30, 2020
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III
  • Publication number: 20190310854
    Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.
    Type: Application
    Filed: April 5, 2018
    Publication date: October 10, 2019
    Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
  • Publication number: 20190310855
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Application
    Filed: April 5, 2018
    Publication date: October 10, 2019
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Publication number: 20190294541
    Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing are described. In various embodiments, a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
    Type: Application
    Filed: June 10, 2019
    Publication date: September 26, 2019
    Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee
  • Publication number: 20190294441
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Application
    Filed: May 28, 2019
    Publication date: September 26, 2019
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10409600
    Abstract: In an embodiment, a processor includes hardware circuitry and/or supports instructions which may be used to detect that a return address or jump address has been modified since it was written to memory. In response to detecting the modification, the processor may be configured to signal an exception or otherwise initiate error handling to prevent execution at the modified address. In an embodiment, the processor may perform a cryptographic sign operation on the return address/jump address before writing the signed return address/jump address to memory and the signature may be verified before the to address is used as a return target or jump target. Security of the system may be improved by foiling ROP/JOP attacks.
    Type: Grant
    Filed: July 5, 2016
    Date of Patent: September 10, 2019
    Assignee: Apple Inc.
    Inventors: Yannick L. Sierra, Jeffry E. Gonion, Thomas Roche, Jerrold V. Hauck
  • Publication number: 20190250917
    Abstract: In an embodiment, a computation engine may offload a processor (e.g. a CPU) and efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.
    Type: Application
    Filed: February 14, 2018
    Publication date: August 15, 2019
    Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10346163
    Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.
    Type: Grant
    Filed: November 1, 2017
    Date of Patent: July 9, 2019
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10331558
    Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing. A compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
    Type: Grant
    Filed: July 28, 2017
    Date of Patent: June 25, 2019
    Assignee: Apple Inc.
    Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee