Patents by Inventor Om Ji Omer

Om Ji Omer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11949414
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Patent number: 11875555
    Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.
    Type: Grant
    Filed: November 24, 2021
    Date of Patent: January 16, 2024
    Assignee: Intel Corporation
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
  • Publication number: 20240005135
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Application
    Filed: April 18, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Publication number: 20230333999
    Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.
    Type: Application
    Filed: June 22, 2023
    Publication date: October 19, 2023
    Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
  • Patent number: 11783170
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: October 10, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11714998
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: August 1, 2023
    Assignee: INTEL CORPORATION
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Publication number: 20230205692
    Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
  • Publication number: 20230169315
    Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.
    Type: Application
    Filed: September 1, 2022
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
  • Publication number: 20230169319
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11620818
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 4, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20220196798
    Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.
    Type: Application
    Filed: July 14, 2021
    Publication date: June 23, 2022
    Inventors: Chulong CHEN, Wenling Margaret HUANG, Saiveena KESARAJU, Ivan SIMÕES GASPAR, Pradyumna S. SINGH, Biji GEORGE, Dipan Kumar MANDAL, Om Ji OMER, Sreenivas SUBRAMONEY, Yuval AMIZUR, Leor BANIN, Hao CHEN, Nir DVORECKI, Shengbo XU
  • Patent number: 11347828
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Publication number: 20220148311
    Abstract: Systems, apparatuses and methods may provide for technology that identifies a plurality of segments based on semantic features and instance features associated with a scene, fuses the plurality of segments into a plurality of instances, and selects classification labels for the plurality of instances. In one example, the plurality of segments is fused into the plurality of instances via a learnable self-attention based network.
    Type: Application
    Filed: January 24, 2022
    Publication date: May 12, 2022
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20220114234
    Abstract: A matrix processing engine is provided for efficient matrix computation performed by a dense matrix compute circuit (performing SIMD operations) and a scalar computing core (performing SISD operations). These two processing components operate together to produce output data tiles by feeding results of the dense SIMD operations to the scalar computing core using thread packing and an in-line buffer for accumulating and packing the dense result data. This permits the scalar computing to spawn threads to operate on the dense results as available and without requiring partial or intermediate data read/writes between the dense and scalar computations.
    Type: Application
    Filed: December 22, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Biji George, Sreenivas Subramoney, Om Ji Omer, Anoop Viswam
  • Publication number: 20220084310
    Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.
    Type: Application
    Filed: November 24, 2021
    Publication date: March 17, 2022
    Applicant: Intel Corporation
    Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
  • Publication number: 20210111722
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 15, 2021
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Publication number: 20210110187
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 15, 2021
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20210090328
    Abstract: Systems, apparatuses and methods provide technology for optimizing processing of sparse data, such as 3D pointcloud data sets. The technology may include generating a locality-aware rulebook based on an input unstructured sparse data set, such as a 3D pointcloud data set, the locality-aware rulebook storing spatial neighborhood information for active voxels in the input unstructured sparse data set, computing an average receptive field (ARF) value based on the locality aware rulebook, and determining, from a plurality of tile size and loop order combinations, a tile size and loop order combination for processing the unstructured sparse data based on the computed ARF value. The technology may also include providing the locality-aware rulebook and the tile size and loop order combination to a compute engine such as a neural network, the compute engine to process the unstructured sparse data using the locality aware rulebook and the tile size and loop order combination.
    Type: Application
    Filed: December 7, 2020
    Publication date: March 25, 2021
    Inventors: Prashant Laddha, Anirud Thyagharajan, Om Ji Omer, Sreenivas Subramoney
  • Publication number: 20200320375
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Application
    Filed: June 23, 2020
    Publication date: October 8, 2020
    Applicant: Intel Corporation
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking