Patents by Inventor Jose E. Moreira
Jose E. Moreira has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11900116Abstract: A system may determine that two instructions may be combined based on a processing power of the processor and a size of the instructions, fuse the two instructions into a pair, map the two instructions with a single register tag, write the register tag into a mapper with bits indicating that the register tag is for a first instruction of the two instructions, write the register tag into the mapper with bits indicating that the register tag is for a second instruction of the two instructions, write the fused instruction pair into an issue queue, issue the fused instruction pair to a vector-scalar transformation units (VSU), and execute the two instructions.Type: GrantFiled: September 29, 2021Date of Patent: February 13, 2024Assignee: International Business Machines CorporationInventors: Dung Q. Nguyen, Brian W. Thompto, Jose E. Moreira, Jessica Hui-Chun Tseng, Pratap C. Pattnaik, Kattamuri Ekanadham, Manoj Kumar
-
Patent number: 11868275Abstract: Aspects of the present disclosure relate to encrypted data processing (EDAP). A processor includes a register file configured to store ciphertext data, an instruction fetch and decode unit configured to fetch and decode instructions, and a functional unit configured to process the stored ciphertext data. The functional unit further includes a decryption module configured to decrypt ciphertext data from the register file to receive cleartext data using an encryption key stored within the functional unit. The functional unit further includes a local buffer configured to store the cleartext data. The functional unit further includes an arithmetic logical unit configured to generate cleartext computation results using the cleartext data The functional unit further includes an encryption module configured to encrypt the cleartext computation results to generate ciphertext computation results for storage back into the register file.Type: GrantFiled: June 24, 2021Date of Patent: January 9, 2024Assignee: International Business Machines CorporationInventors: Manoj Kumar, Gianfranco Bilardi, Kattamuri Ekanadham, Jose E. Moreira, Pratap C. Pattnaik, Jessica Hui-Chun Tseng
-
Patent number: 11836493Abstract: Embodiments for providing memory access operations for graph analytics by a processor are disclosed. An entire chunk of load and store instructions may be atomically and concurrently executed, where the entire chunk of the load and store instructions are delineated from a plurality of alternative load and store instructions.Type: GrantFiled: February 10, 2022Date of Patent: December 5, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj Kumar, Gianfranco Bilardi, Kattamuri Ekanadham, Jose E. Moreira, Pratap C. Pattnaik, Jessica Hui-Chun Tseng
-
Publication number: 20230367597Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.Type: ApplicationFiled: July 28, 2023Publication date: November 16, 2023Inventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
-
Patent number: 11755320Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A plurality of linear algebra operations is repeated in parallel within the compute array to update a result matrix in an accumulator register based on the first input matrix, the second input matrix, and a number of rank updates of the result matrix to store in the accumulator register.Type: GrantFiled: September 21, 2021Date of Patent: September 12, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
-
Patent number: 11755325Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.Type: GrantFiled: August 27, 2021Date of Patent: September 12, 2023Assignee: International Business Machines CorporationInventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
-
Publication number: 20230251862Abstract: Embodiments for providing memory access operations for graph analytics by a processor are disclosed. An entire chunk of load and store instructions may be atomically and concurrently executed, where the entire chunk of the load and store instructions are delineated from a plurality of alternative load and store instructions.Type: ApplicationFiled: February 10, 2022Publication date: August 10, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Manoj KUMAR, Gianfranco BILARDI, Kattamuri EKANADHAM, Jose E. MOREIRA, Pratap C. PATTNAIK, Jessica Hui-Chun TSENG
-
Patent number: 11663009Abstract: A Reduced Instruction Set Computer (“RISC”) supporting large-word operations in a computing environment is disclosed. In one implementation, in response to receiving one or more control signals from a central processing unit (“CPU”), a set of operations are executed on a state of a special purpose execution unit (“SPU”) having a plurality of SPU registers, the SPU being associated with the CPU and the state of the SPU having word widths of one or more of the plurality of registers being greater in size than word widths of a plurality of CPU registers of a computing system and a set of state-master bits to synchronize the state of the SPU and a state of the CPU. The results of the set of operations are stored in the plurality of CPU registers or an alternative set of the plurality of SPU registers.Type: GrantFiled: October 14, 2021Date of Patent: May 30, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sandhya Koteshwara, Kattamuri Ekanadham, Manoj Kumar, Jose E. Moreira, Pratap C. Pattnaik
-
Publication number: 20230124185Abstract: A Reduced Instruction Set Computer (“RISC”) supporting large-word operations in a computing environment is disclosed. In one implementation, in response to receiving one or more control signals from a central processing unit (“CPU”), a set of operations are executed on a state of a special purpose execution unit (“SPU”) having a plurality of SPU registers, the SPU being associated with the CPU and the state of the SPU having word widths of one or more of the plurality of registers being greater in size than word widths of a plurality of CPU registers of a computing system and a set of state-master bits to synchronize the state of the SPU and a state of the CPU. The results of the set of operations are stored in the plurality of CPU registers or an alternative set of the plurality of SPU registers.Type: ApplicationFiled: October 14, 2021Publication date: April 20, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sandhya KOTESHWARA, Kattamuri EKANADHAM, Manoj KUMAR, Jose E. MOREIRA, Pratap C. PATTNAIK
-
Publication number: 20220414023Abstract: Aspects of the present disclosure relate to encrypted data processing (EDAP). A processor includes a register file configured to store ciphertext data, an instruction fetch and decode unit configured to fetch and decode instructions, and a functional unit configured to process the stored ciphertext data. The functional unit further includes a decryption module configured to decrypt ciphertext data from the register file to receive cleartext data using an encryption key stored within the functional unit. The functional unit further includes a local buffer configured to store the cleartext data. The functional unit further includes an arithmetic logical unit configured to generate cleartext computation results using the cleartext data The functional unit further includes an encryption module configured to encrypt the cleartext computation results to generate ciphertext computation results for storage back into the register file.Type: ApplicationFiled: June 24, 2021Publication date: December 29, 2022Inventors: Manoj Kumar, Gianfranco Bilardi, Kattamuri Ekanadham, Jose E. Moreira, Pratap C. Pattnaik, Jessica Hui-Chun Tseng
-
Publication number: 20220414270Abstract: Aspects of the present disclosure relate to encrypted data processing (EDAP). Encrypted data from a cache to be loaded into a register file can be accessed. The encrypted data can be decrypted to receive cleartext data. The cleartext data can be written to the register file. The cleartext data can be processed using at least one functional unit to receive cleartext computation results. The cleartext computation results can then be written back to the register file.Type: ApplicationFiled: June 24, 2021Publication date: December 29, 2022Inventors: Jessica Hui-Chun Tseng, Jose E. Moreira, Pratap C. Pattnaik, Manoj Kumar, Kattamuri Ekanadham, Gianfranco Bilardi
-
Publication number: 20220191020Abstract: A processor and method for processing information is disclosed that in response to encountering a function entry instruction while running an application, computes an entry hash value using a hash of three hash input parameters, wherein one of the input parameters is a secret key stored in the special purpose register; and in response to encountering a function exit instruction, computes an exit hash value using the same three input parameters and the same hash used when computing the entry hash value; and determines if the entry hash value is the same as the exit hash value.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Inventors: Jose E. Moreira, Arnold Flores, Debapriya Chatterjee, Kattamuri Ekanadham
-
Publication number: 20220188463Abstract: A computer system, processor, computer program product, and method for executing instructions in a software application that includes a processor that can be dynamically controlled, in response to a value set in a control register, to operate in either a secure mode or a performance mode. In the secure mode, the processor: upon encountering a secure mode entry instruction, computes an entry hash value using a hash function and stores the entry hash value; and upon encountering a secure mode exit instruction, computes an exit hash value, loads the entry hash value, and determines whether the entry hash value is the same as the exit hash value, and depending upon verification of the hash values can execute the return function or transfer control to the operating system. In the performance mode, the processor: executes both the secure mode entry instruction and the secure mode exit instruction as no-operations.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Inventors: Debapriya Chatterjee, Brian W. Thompto, Jose E. Moreira
-
Patent number: 11294685Abstract: Method and systems for creating a sequence of fused instructions. An instruction stream is obtained, and a window of instructions from the instruction stream is examined and one or more groups of instructions that satisfy one or more fusion rules are identified. One or more of the groups of instructions that satisfy the one or more fusion rules are fused and a maximal length data dependence chain in the instruction stream is analyzed by analyzing every node in a dependence graph in a selected window of instructions. Fusion of an instruction group is prevented based on the maximal length data dependence chain.Type: GrantFiled: June 4, 2019Date of Patent: April 5, 2022Assignee: International Business Machines CorporationInventors: Jessica Hui-Chun Tseng, Manoj Kumar, Kattamuri Ekanadham, Jose E. Moreira, Pratap C. Pattnaik
-
Patent number: 11281745Abstract: Methods and systems of matrix multiplication are described. In an example, a processor can multiply a first entry of a first vector of a first data array with a second vector of a second data array to generate a third vector of a third data array. The processor can store the third vector of the third data array in the second register file. The processor can multiply a second entry of the first vector with the second vector to generate a fourth vector of the third data array. The processor can store the fourth vector of the third data array in the second register file. The processor can combine vectors of the third data array that are stored in the second register file to produce the third data array.Type: GrantFiled: August 16, 2019Date of Patent: March 22, 2022Assignee: International Business Machines CorporationInventors: Bruce Fleischer, Jose E. Moreira, Joel A. Silberman
-
Publication number: 20220050682Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.Type: ApplicationFiled: August 27, 2021Publication date: February 17, 2022Inventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
-
Publication number: 20220012010Abstract: A data ordering device includes a plurality of inputs N and a plurality of outputs M. There is a sorting network coupled between the plurality of inputs N and the plurality of outputs M. There are one or more latches comprising a buffer coupled between each input of the plurality of inputs N and a corresponding input of the sorting network. There are one or more latches comprising a buffer coupled between each output of the plurality of outputs M and a corresponding output of the sorting network. There is an input for a control signal operative to initiate a sorting of data between the plurality of inputs N and the plurality of outputs M. The data ordering device is coupled to a core of a central processing unit.Type: ApplicationFiled: September 26, 2021Publication date: January 13, 2022Inventors: Manoj Kumar, Pratap C. Pattnaik, Kattamuri Ekanadham, Jessica Tseng, Jose E. Moreira
-
Publication number: 20220004386Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A plurality of linear algebra operations is repeated in parallel within the compute array to update a result matrix in an accumulator register based on the first input matrix, the second input matrix, and a number of rank updates of the result matrix to store in the accumulator register.Type: ApplicationFiled: September 21, 2021Publication date: January 6, 2022Inventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
-
Patent number: 11188328Abstract: Aspects include a compute array of a processor with mixed-precision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A number of rank updates of a result matrix to store in an accumulator register having a predetermined size are determined, where the number of rank updates is based on the first precision and the first shape of the first input matrix, the second precision and the second shape of the second input matrix, and the predetermined size of the accumulator register. A plurality of linear algebra operations is repeated in parallel within the compute array to update the result matrix in the accumulator register based on the first input matrix, the second input matrix, and the number of rank updates.Type: GrantFiled: December 12, 2019Date of Patent: November 30, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner
-
Patent number: 11182458Abstract: Embodiments of the present invention are directed to a new instruction set extension and a method for providing 3D lane predication for matrix operations. In a non-limiting embodiment of the invention, a first input matrix having m rows and k columns and a second input matrix having k rows and n columns are received by a compute array of a processor. A three-dimensional predicate mask having an M-bit row mask, an N-bit column mask, and a K-bit rank mask is generated. A result matrix of up to m rows, up to n columns, and up to k rank updates is determined based on the first input matrix, the second input matrix, and the predicate mask.Type: GrantFiled: December 12, 2019Date of Patent: November 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brett Olsson, Brian W. Thompto, Jose E. Moreira, Silvia Melitta Mueller, Andreas Wagner