Patents by Inventor Jeffry E. Gonion
Jeffry E. Gonion has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10831488Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.Type: GrantFiled: August 20, 2018Date of Patent: November 10, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III, Andrew J. Beaumont-Smith
-
Publication number: 20200348934Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.Type: ApplicationFiled: July 14, 2020Publication date: November 5, 2020Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD, Gerard R. Williams, III
-
Patent number: 10769065Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.Type: GrantFiled: June 10, 2019Date of Patent: September 8, 2020Assignee: Apple Inc.Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee
-
Publication number: 20200272464Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: ApplicationFiled: March 13, 2020Publication date: August 27, 2020Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10754649Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.Type: GrantFiled: July 24, 2018Date of Patent: August 25, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III
-
Publication number: 20200241876Abstract: In an embodiment, a processor (e.g. a CPU) may offload transcendental computation to a computation engine that may efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.Type: ApplicationFiled: April 13, 2020Publication date: July 30, 2020Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20200225958Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: ApplicationFiled: April 1, 2020Publication date: July 16, 2020Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari, PhD
-
Publication number: 20200192672Abstract: A system and method for efficiently protecting branch prediction information. In various embodiments, a computing system includes at least one processor with a branch predictor storing branch target addresses and security tags in a table. The security tag includes one or more components of machine context. When the branch predictor receives a portion of a first program counter of a first branch instruction, and hits on a first table entry during an access, the branch predictor reads out a first security tag. The branch predictor compares one or more components of machine context of the first security tag to one or more components of machine context of the first branch instruction. When there is at least one mismatch, the branch prediction information of the first table entry is not used. Additionally, there is no updating of any branch prediction training information of the first table entry.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Inventors: Jeffry E. Gonion, Ian D. Kountanis, Conrado Blasco, Steven Andrew Myers, Yannick L. Sierra
-
Publication number: 20200192673Abstract: Techniques are disclosed relating to protecting branch prediction information. In various embodiments, an integrated circuit includes branch prediction logic having a table that maintains a plurality of entries storing encrypted target address information for branch instructions. The branch prediction logic is configured to receive machine context information for a branch instruction having a target address being predicted by the branch prediction logic, the machine context information including a program counter associated with the branch instruction. The branch prediction logic is configured to use the machine context information to decrypt encrypted target address information stored in one of the plurality of entries identified based on the program counter.Type: ApplicationFiled: October 25, 2019Publication date: June 18, 2020Inventors: Steven A. Myers, Jeffry E. Gonion, Yannick L. Sierra, Thomas Icart
-
Patent number: 10642620Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: GrantFiled: April 5, 2018Date of Patent: May 5, 2020Assignee: Apple Inc.Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10592239Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: GrantFiled: May 28, 2019Date of Patent: March 17, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20200034145Abstract: In an embodiment, a computation engine is configured to perform vector multiplications, producing either vector results or outer product (matrix) results. The instructions provided to the computation engine specify a matrix mode or a vector mode for the instructions. The computation engine performs the specified operation. The computation engine may perform numerous computations in parallel, in an embodiment. In an embodiment, the instructions may also specify an offset with the input memories, providing additional flexibility in the location of operands. More particularly, the computation engine may be configured to perform numerous multiplication operations in parallel and to accumulate results in a result memory, performing multiply-accumulate operations for each matrix/vector element in the targeted locations of the output memory.Type: ApplicationFiled: July 24, 2018Publication date: January 30, 2020Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III
-
Publication number: 20190310854Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.Type: ApplicationFiled: April 5, 2018Publication date: October 10, 2019Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
-
Publication number: 20190310855Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: ApplicationFiled: April 5, 2018Publication date: October 10, 2019Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Publication number: 20190294541Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing are described. In various embodiments, a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.Type: ApplicationFiled: June 10, 2019Publication date: September 26, 2019Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee
-
Publication number: 20190294441Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: ApplicationFiled: May 28, 2019Publication date: September 26, 2019Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10409600Abstract: In an embodiment, a processor includes hardware circuitry and/or supports instructions which may be used to detect that a return address or jump address has been modified since it was written to memory. In response to detecting the modification, the processor may be configured to signal an exception or otherwise initiate error handling to prevent execution at the modified address. In an embodiment, the processor may perform a cryptographic sign operation on the return address/jump address before writing the signed return address/jump address to memory and the signature may be verified before the to address is used as a return target or jump target. Security of the system may be improved by foiling ROP/JOP attacks.Type: GrantFiled: July 5, 2016Date of Patent: September 10, 2019Assignee: Apple Inc.Inventors: Yannick L. Sierra, Jeffry E. Gonion, Thomas Roche, Jerrold V. Hauck
-
Publication number: 20190250917Abstract: In an embodiment, a computation engine may offload a processor (e.g. a CPU) and efficiently perform transcendental functions. The computation engine may implement a range instruction that may be included in a program being executed by the CPU. The CPU may dispatch the range instruction to the computation engine. The range instruction may take an input operand (that is to be evaluated in a transcendental function, for example) and may reference a range table that defines a set of ranges for the transcendental function. The range instruction may identify one of the set of ranges that includes the input operand. For example, the range instruction may output an interval number identifying which interval of an overall set of valid input values contains the input operand. In an embodiment, the range instruction may take an input vector operand and output a vector of interval identifiers.Type: ApplicationFiled: February 14, 2018Publication date: August 15, 2019Inventors: O-Cheng Chang, Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10346163Abstract: In an embodiment, a matrix computation engine is configured to perform matrix computations (e.g. matrix multiplications). The matrix computation engine may perform numerous matrix computations in parallel, in an embodiment. More particularly, the matrix computation engine may be configured to perform numerous multiplication operations in parallel on input matrix elements, generating resulting matrix elements. In an embodiment, the matrix computation engine may be configured to accumulate results in a result memory, performing multiply-accumulate operations for each matrix element of each matrix.Type: GrantFiled: November 1, 2017Date of Patent: July 9, 2019Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Erik Norden, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10331558Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing. A compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.Type: GrantFiled: July 28, 2017Date of Patent: June 25, 2019Assignee: Apple Inc.Inventors: Ali Sazegari, Charles E. Tucker, Jeffry E. Gonion, Gerard R. Williams, III, Chris Cheng-Chieh Lee