Patents by Inventor Himanshu Kaul

Himanshu Kaul has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11868296
    Abstract: An apparatus includes a first port set that includes an input port and an output port. The apparatus further includes a plurality of second port sets. Each of the second port sets includes an input port coupled to the output port of the first port set and an output port coupled to the input port of the first port set. The plurality of second port sets are to each communicate at a first maximum bandwidth and the first port set is to communicate at a second maximum bandwidth that is higher than the first maximum bandwidth.
    Type: Grant
    Filed: March 22, 2022
    Date of Patent: January 9, 2024
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Gregory K. Chen
  • Patent number: 11720355
    Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
    Type: Grant
    Filed: June 7, 2022
    Date of Patent: August 8, 2023
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20230046506
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
    Type: Application
    Filed: October 17, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20220357945
    Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
    Type: Application
    Filed: June 7, 2022
    Publication date: November 10, 2022
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20220214988
    Abstract: An apparatus includes a first port set that includes an input port and an output port. The apparatus further includes a plurality of second port sets. Each of the second port sets includes an input port coupled to the output port of the first port set and an output port coupled to the input port of the first port set. The plurality of second port sets are to each communicate at a first maximum bandwidth and the first port set is to communicate at a second maximum bandwidth that is higher than the first maximum bandwidth.
    Type: Application
    Filed: March 22, 2022
    Publication date: July 7, 2022
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Gregory K. Chen
  • Publication number: 20220188075
    Abstract: A FPMAC operation has two operands: an input operand and a weight operand. The operands may have a format of FP16, BF16, or INT8. Each operand is split into two portions. The two portions are stored in separate storage units. Then operands are transferred to register files of a PE, with each register file storing bits of an operand sequentially. The PE performs the FPMAC operation based on the operands. The PE may include an FPMAC unit configured to compute an individual partial sum of the PE. The PE may also include an FP adder to accumulate the individual partial sum with other data, such as an output from another PE or an output form another PE array. The FP adder may be fused with the FPMAC unit in a single circuit that can do speculative alignment and has separate critical paths for alignment and normalization.
    Type: Application
    Filed: March 7, 2022
    Publication date: June 16, 2022
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Mark A. Anders, Raymond Jit-Hung Sung, Debabrata Mohapatra, Deepak Abraham Mathaikutty, Ram K. Krishnamurthy, Himanshu Kaul
  • Patent number: 11360767
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Grant
    Filed: July 6, 2021
    Date of Patent: June 14, 2022
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11321263
    Abstract: An apparatus includes a first port set that includes an input port and an output port. The apparatus further includes a plurality of second port sets. Each of the second port sets includes an input port coupled to the output port of the first port set and an output port coupled to the input port of the first port set. The plurality of second port sets are to each communicate at a first maximum bandwidth and the first port set is to communicate at a second maximum bandwidth that is higher than the first maximum bandwidth.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Gregory K. Chen
  • Patent number: 11288040
    Abstract: Systems, apparatuses and methods may provide for technology that conduct a first alignment between a plurality of floating-point numbers based on a first subset of exponent bits. The technology may also conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floating-point numbers based on a second subset of exponent bits, where the first subset of exponent bits are LSBs and the second subset of exponent bits are MSBs. In one example, technology adds the aligned plurality of floating-point numbers to one another. With regard to the second alignment, the technology may also identify individual exponents of a plurality of floating-point numbers, identify a maximum exponent across the individual exponents, and conduct a subtraction of the individual exponents from the maximum exponent, where the subtraction is conducted from MSB to LSB.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: March 29, 2022
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark Anders
  • Publication number: 20220019431
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 20, 2022
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20210397414
    Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 23, 2021
    Inventors: Arnab Raha, Mark A. Anders, Martin Power, Martin Langhammer, Himanshu Kaul, Debabrata Mohapatra, Gautham Chinya, Cormac Brick, Ram Krishnamurthy
  • Patent number: 11169799
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
    Type: Grant
    Filed: June 5, 2019
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11080046
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20210182058
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Application
    Filed: February 5, 2021
    Publication date: June 17, 2021
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20210124579
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
    Type: Application
    Filed: December 9, 2020
    Publication date: April 29, 2021
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 10944402
    Abstract: Some embodiments include apparatuses having a first circuit path including drive units coupled in series between a first node and a first additional node, a second circuit path including drive units coupled in series between a second node and a second additional node, each drive unit of the driver units of the first circuit path and the second circuit path including an inverter, and a transmission gate circuit including an input node and an output node coupled to an input node and an output node, respectively, of the inverter; and control circuitry to provide control information to the transmission gate circuit of each of the driver units of the first circuit path and the second circuit path.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: SeongJong Kim, Mark A. Anders, Himanshu Kaul
  • Publication number: 20210043500
    Abstract: Embodiments disclosed herein include interconnect layers that include non-uniform interconnect heights and methods of forming such devices. In an embodiment, an interconnect layer comprises an interlayer dielectric (ILD), a first interconnect disposed in the ILD, wherein the first interconnect has a first height, and a second interconnect disposed in the ILD, wherein the second interconnect has a second height that is different than the first height.
    Type: Application
    Filed: August 7, 2019
    Publication date: February 11, 2021
    Inventors: Kevin Lai LIN, Mauro KOBRINSKY, Mark ANDERS, Himanshu KAUL, Ram KRISHNAMURTHY
  • Publication number: 20210043567
    Abstract: Embodiments disclosed herein include a semiconductor device with interconnects with non-uniform heights. In an embodiment, the semiconductor device comprises a semiconductor substrate, and a back end of line (BEOL) stack over the semiconductor substrate. In an embodiment, the BEOL stack comprises first interconnects and second interconnects in an interconnect layer of the BEOL stack. In an embodiment, the first interconnects have a first height and the second interconnects have a second height that is different than the first height.
    Type: Application
    Filed: August 7, 2019
    Publication date: February 11, 2021
    Inventors: Mark ANDERS, Himanshu KAUL, Ram KRISHNAMURTHY, Kevin Lai LIN, Mauro KOBRINSKY
  • Publication number: 20210012129
    Abstract: The present invention is related to a method of registering and identifying a user of a voice-controlled device. The method comprising: registering the user of the device, determining at least one of a facial, eye and body features of the user upon determining at least one characteristic of a voice input from the user is below a threshold level, determining an output upon comparing the determined at least one of the facial, eye and body features of the user with a prestored information associated with the user in a database and identifying the user in accordance with the determined output.
    Type: Application
    Filed: July 11, 2019
    Publication date: January 14, 2021
    Inventor: Himanshu Kaul
  • Publication number: 20210014418
    Abstract: The present invention is related to a method of generating a real time panoramic image comprising: capturing at least one media content by a rotatable camera, optionally of a voice-controlled device, wherein the rotatable camera, operable by a motor signal of a motor, is enabled to capture at least one media content at a given motor signal, extracting at least one feature of the at least one captured media content, associating the at least one captured media content with the given motor signal and the extracted at least one feature of the at least one captured media content and the real time stitching of the at least one captured media content based on the associated given motor signal and the associated extracted at least one feature to create the panoramic image.
    Type: Application
    Filed: July 11, 2019
    Publication date: January 14, 2021
    Inventor: Himanshu Kaul