Patents by Inventor Sergey Gribok

Sergey Gribok has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230026331
    Abstract: A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.
    Type: Application
    Filed: September 23, 2022
    Publication date: January 26, 2023
    Applicant: Intel Corporation
    Inventors: Sergey Gribok, Bogdan Pasca, Martin Langhammer
  • Patent number: 11436399
    Abstract: A method for implementing a multiplier on a programmable logic device (PLD) is disclosed. Partial product bits of the multiplier are identified and how the partial product bits are to be summed to generate a final product from a multiplier and multiplicand are determined. Chains of PLD cells and cells in the chains of PLD cells for generating and summing the partial product bits are assigned. It is determined whether a bit in an assigned cell in an assigned chain of PLD cells is under-utilized. In response to determining that a bit is under-utilized, the assigning of the chains of PLD cells and cells for generating and summing the partial product bits are changed to improve an overall utilization of the chains of PLD cells and cells in the chains of PLD cells.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: September 6, 2022
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Publication number: 20220107783
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.
    Type: Application
    Filed: December 16, 2021
    Publication date: April 7, 2022
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Patent number: 11210063
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: December 28, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Patent number: 11080019
    Abstract: A method for designing and configuring a system on a field programmable gate array (FPGA) is disclosed. A portion of the system that is implemented greater than a predetermined number of times is identified. A structural netlist that describes how to implement the portion of the system a plurality of times on the FPGA and that leverages a repetitive nature of implementing the portion is generated. The identifying and generating is performed prior to synthesizing and placing other portions of the system that are not implemented greater than the predetermined number of time. Synthesizing, placing, and routing the other portions of the system on the FPGA is performed in accordance with the structural netlist. The FPGA is configured with a configuration file that includes a design for the system that reflects the synthesizing, placing, and routing, wherein the configuring physically transforms resources on the FPGA to implement the system.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Gregg William Baeckler, Sergey Gribok
  • Patent number: 10871946
    Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: December 22, 2020
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Gregg William Baeckler, Sergey Gribok, Dmitry N. Denisenko, Bogdan Pasca
  • Patent number: 10867090
    Abstract: A method for designing a system on a target device is disclosed. The system is synthesized from a register transfer level description. The system is placed on the target device. The system is routed on the target device. A configuration file is generated that reflects the synthesizing, placing, and routing of the system for programming the target device. A modification for the system is identified. The configuration file is modified to effectuate the modification for the system without changing the placing and routing of the system.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: December 15, 2020
    Assignee: Intel Corporation
    Inventors: Gregg William Baeckler, Martin Langhammer, Sergey Gribok, Scott J. Weber, Gregory Steinke
  • Patent number: 10790829
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: September 29, 2020
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Patent number: 10732932
    Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: August 4, 2020
    Assignee: Intel Corporation
    Inventors: Bogdan Pasca, Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Patent number: 10715144
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: July 14, 2020
    Assignee: Intel Corporation
    Inventors: Sergey Gribok, Gregg Baeckler, Martin Langhammer
  • Publication number: 20200142671
    Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.
    Type: Application
    Filed: December 21, 2018
    Publication date: May 7, 2020
    Applicant: Intel Corporation
    Inventors: Bogdan Pasca, Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Publication number: 20200106442
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 2, 2020
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Sergey Gribok, Gregg William Baeckler
  • Publication number: 20200026494
    Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.
    Type: Application
    Filed: September 27, 2019
    Publication date: January 23, 2020
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Bogdan Pasca, Sergey Gribok, Gregg William Baeckler, Andrei Hagiescu
  • Publication number: 20190288688
    Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.
    Type: Application
    Filed: June 6, 2019
    Publication date: September 19, 2019
    Applicant: Intel Corporation
    Inventors: Sergey Gribok, Gregg Baeckler, Martin Langhammer
  • Publication number: 20190213289
    Abstract: A method for designing a system on a target device is disclosed. The system is synthesized from a register transfer level description. The system is placed on the target device. The system is routed on the target device. A configuration file is generated that reflects the synthesizing, placing, and routing of the system for programming the target device. A modification for the system is identified. The configuration file is modified to effectuate the modification for the system without changing the placing and routing of the system.
    Type: Application
    Filed: March 18, 2019
    Publication date: July 11, 2019
    Inventors: Gregg William BAECKLER, Martin LANGHAMMER, Sergey GRIBOK, Scott J. WEBER, Gregory STEINKE
  • Publication number: 20190121927
    Abstract: A method for implementing a multiplier on a programmable logic device (PLD) is disclosed. Partial product bits of the multiplier are identified and how the partial product bits are to be summed to generate a final product from a multiplier and multiplicand are determined. Chains of PLD cells and cells in the chains of PLD cells for generating and summing the partial product bits are assigned. It is determined whether a bit in an assigned cell in an assigned chain of PLD cells is under-utilized. In response to determining that a bit is under-utilized, the assigning of the chains of PLD cells and cells for generating and summing the partial product bits are changed to improve an overall utilization of the chains of PLD cells and cells in the chains of PLD cells.
    Type: Application
    Filed: December 12, 2018
    Publication date: April 25, 2019
    Inventors: Martin LANGHAMMER, Sergey GRIBOK, Gregg William BAECKLER
  • Patent number: 10223488
    Abstract: A method for designing a system on a target device includes identifying components in a netlist that perform a division operation. The netlist is modified during synthesis to utilize other components to compute a result of the division operation by performing a multiplication operation.
    Type: Grant
    Filed: July 19, 2016
    Date of Patent: March 5, 2019
    Assignee: Altera Corporation
    Inventor: Sergey Gribok
  • Publication number: 20190042683
    Abstract: A method for designing and configuring a system on a field programmable gate array (FPGA) is disclosed. A portion of the system that is implemented greater than a predetermined number of times is identified. A structural netlist that describes how to implement the portion of the system a plurality of times on the FPGA and that leverages a repetitive nature of implementing the portion is generated. The identifying and generating is performed prior to synthesizing and placing other portions of the system that are not implemented greater than the predetermined number of time. Synthesizing, placing, and routing the other portions of the system on the FPGA is performed in accordance with the structural netlist. The FPGA is configured with a configuration file that includes a design for the system that reflects the synthesizing, placing, and routing, wherein the configuring physically transforms resources on the FPGA to implement the system.
    Type: Application
    Filed: June 29, 2018
    Publication date: February 7, 2019
    Inventors: Martin LANGHAMMER, Gregg William BAECKLER, Sergey GRIBOK
  • Publication number: 20190042198
    Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.
    Type: Application
    Filed: September 27, 2018
    Publication date: February 7, 2019
    Applicant: Intel Corporation
    Inventors: Martin Langhammer, Gregg William Baeckler, Sergey Gribok, Dmitry N. Denisenko, Bogdan Pasca
  • Patent number: 10102892
    Abstract: Unlike prior RAM-based shift register circuits, the presently-disclosed shift register circuit does not require control circuits to generate write and read address signals. Instead, the presently-disclosed shift register circuit utilizes a portion of RAM to store and provide the write and read address signals. The write and read addresses are output from the data output port of the RAM, and received by the write and read address ports of the RAM. Advantageously, the presently-disclosed shift register circuit requires less area to implement because the need for write and read control circuits is eliminated.
    Type: Grant
    Filed: June 1, 2017
    Date of Patent: October 16, 2018
    Assignee: Intel Corporation
    Inventor: Sergey Gribok