Patents by Inventor Om Ji Omer

Om Ji Omer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250021819
    Abstract: Systems, apparatus, articles of manufacture, and methods for quality and capacity-aware grouped query attention are disclosed. To accomplish such groupings, example instructions cause a machine to create a plurality of groups of query heads present in a key value cache using an evolutionary algorithm based on at least two objectives, quantify an amount of error introduced by a first group of query heads in the plurality of groups of query heads, and retain the query heads of the first group of query heads in a non-grouped arrangement when the error meets an error threshold.
    Type: Application
    Filed: September 27, 2024
    Publication date: January 16, 2025
    Applicant: Intel Corporation
    Inventors: Vinay Joshi, Om Ji Omer, Prashant Laddha, Shambhavi Sinha
  • Patent number: 12140696
    Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Chulong Chen, Wenling Margaret Huang, Saiveena Kesaraju, Ivan Simões Gaspar, Pradyumna S. Singh, Biji George, Dipan Kumar Mandal, Om Ji Omer, Sreenivas Subramoney, Yuval Amizur, Leor Banin, Hao Chen, Nir Dvorecki, Shengbo Xu
  • Publication number: 20240289168
    Abstract: Systems, apparatuses and methods may provide for technology that identifies a type of a first activation function, identifies a derivative level of the first activation function, and generates a first instruction based on the type of the first activation function and the derivative level of the first activation function. The technology also includes an accelerator having logic coupled to one or more substrates, the logic including a compute engine including a plurality of arithmetic operators, a multiplexer network coupled to the compute engine, and a controller coupled to the multiplexer network, the controller to detect the first instruction, decode the first instruction to identify the first activation function, and drive the multiplexer network to form first connections between two or more of the plurality of arithmetic operators in accordance with the first activation function, wherein the first connections are to cause the compute engine to conduct the first activation function.
    Type: Application
    Filed: November 10, 2023
    Publication date: August 29, 2024
    Inventors: Krishnan Ananthanarayanan, Martin Langhammer, Om Ji Omer, Bogdan Pasca, Kamlesh Pillai, Pramod Udupa
  • Publication number: 20240201949
    Abstract: Systems, apparatuses and methods may provide for technology that includes a compute-in-memory (CiM) enabled memory array to conduct digital bit-serial multiply and accumulate (MAC) operations on multi-bit input data and weight data stored in the CiM enabled memory array, an adder tree coupled to the CiM enabled memory array, an accumulator coupled to the adder tree, and an input bit selection stage coupled to the CiM enabled memory array, wherein the input bit selection stage restricts serial bit selection on the multi-bit input data to non-zero values during the digital MAC operations.
    Type: Application
    Filed: February 28, 2024
    Publication date: June 20, 2024
    Inventors: Sagar Varma Sayyaparaju, Om Ji Omer, Sreenivas Subramoney
  • Patent number: 11949414
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Patent number: 11875555
    Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.
    Type: Grant
    Filed: November 24, 2021
    Date of Patent: January 16, 2024
    Assignee: Intel Corporation
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
  • Publication number: 20240005135
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Application
    Filed: April 18, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Publication number: 20230333999
    Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.
    Type: Application
    Filed: June 22, 2023
    Publication date: October 19, 2023
    Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
  • Patent number: 11783170
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: October 10, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11714998
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: August 1, 2023
    Assignee: INTEL CORPORATION
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Publication number: 20230205692
    Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
  • Publication number: 20230169315
    Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.
    Type: Application
    Filed: September 1, 2022
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
  • Publication number: 20230169319
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11620818
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 4, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20220196798
    Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.
    Type: Application
    Filed: July 14, 2021
    Publication date: June 23, 2022
    Inventors: Chulong CHEN, Wenling Margaret HUANG, Saiveena KESARAJU, Ivan SIMÕES GASPAR, Pradyumna S. SINGH, Biji GEORGE, Dipan Kumar MANDAL, Om Ji OMER, Sreenivas SUBRAMONEY, Yuval AMIZUR, Leor BANIN, Hao CHEN, Nir DVORECKI, Shengbo XU
  • Patent number: 11347828
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Publication number: 20220148311
    Abstract: Systems, apparatuses and methods may provide for technology that identifies a plurality of segments based on semantic features and instance features associated with a scene, fuses the plurality of segments into a plurality of instances, and selects classification labels for the plurality of instances. In one example, the plurality of segments is fused into the plurality of instances via a learnable self-attention based network.
    Type: Application
    Filed: January 24, 2022
    Publication date: May 12, 2022
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
  • Publication number: 20220114234
    Abstract: A matrix processing engine is provided for efficient matrix computation performed by a dense matrix compute circuit (performing SIMD operations) and a scalar computing core (performing SISD operations). These two processing components operate together to produce output data tiles by feeding results of the dense SIMD operations to the scalar computing core using thread packing and an in-line buffer for accumulating and packing the dense result data. This permits the scalar computing to spawn threads to operate on the dense results as available and without requiring partial or intermediate data read/writes between the dense and scalar computations.
    Type: Application
    Filed: December 22, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Biji George, Sreenivas Subramoney, Om Ji Omer, Anoop Viswam
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20220084310
    Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.
    Type: Application
    Filed: November 24, 2021
    Publication date: March 17, 2022
    Applicant: Intel Corporation
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer