Patents by Inventor Ulf Hanebutte

Ulf Hanebutte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11966857
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
    Type: Grant
    Filed: April 6, 2021
    Date of Patent: April 23, 2024
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Patent number: 11934965
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
    Type: Grant
    Filed: April 6, 2021
    Date of Patent: March 19, 2024
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Patent number: 11733983
    Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.
    Type: Grant
    Filed: September 8, 2022
    Date of Patent: August 22, 2023
    Assignee: Marvell Asia Pte Ltd
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Publication number: 20230096994
    Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
    Type: Application
    Filed: December 6, 2022
    Publication date: March 30, 2023
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20230015688
    Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.
    Type: Application
    Filed: September 8, 2022
    Publication date: January 19, 2023
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Patent number: 11551148
    Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
    Type: Grant
    Filed: April 29, 2020
    Date of Patent: January 10, 2023
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20230004365
    Abstract: A system includes a compiler including a plurality of compiler blocks. The compiler blocks of the plurality of compiler blocks are compossible. The compiler is configured to identify one or more resources in a hardware to execute a set of low-level instructions that is generated from a high-level function in a high-level code. The compiler is further configured to determine one or more processing operations to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The compiler is configured to compile the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware.
    Type: Application
    Filed: March 2, 2022
    Publication date: January 5, 2023
    Inventors: Ulf Hanebutte, Senad Durakovic, Chien-Chun Chou, Fu-Hwa Wang, Mohana Tandyala
  • Patent number: 11494676
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: November 8, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Patent number: 11467811
    Abstract: A method includes receiving a high-level function in a high-level code of an application is received. The method also include identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code. One or more processing operations are determined to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The high-level function in the high-level code of the application is compiled into the set of low-level instructions to be executed on the hardware. A plurality of structured metadata is generated and includes information associated with the determining resources in the hardware and further includes information associated with the determining one or more processing operations.
    Type: Grant
    Filed: July 30, 2021
    Date of Patent: October 11, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Publication number: 20220188109
    Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.
    Type: Application
    Filed: March 4, 2022
    Publication date: June 16, 2022
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Publication number: 20220188111
    Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method also includes determining whether the received input data is positive infinity or negative infinity prior to performing the floating point arithmetic operation. The method further includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is positive infinity or negative infinity.
    Type: Application
    Filed: March 4, 2022
    Publication date: June 16, 2022
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Publication number: 20220188110
    Abstract: A method includes receiving a first input data and a second input data at a floating point arithmetic operating unit, wherein the first input data and the second input data are associated with operands of a floating point arithmetic operation respectively, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the first input data and the second input data. The method further includes determining whether the first input data is a qnan (quiet not-a-number) or whether the first input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. A value of the first input data is modified prior to performing the floating point arithmetic operation if the first input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the first input data being either qnan or snan.
    Type: Application
    Filed: March 4, 2022
    Publication date: June 16, 2022
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Publication number: 20220188108
    Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data to generate an output result. The method also includes determining whether the output result is going to cause a floating point hardware exception responsive to the floating point arithmetic operation on the input data. The method further includes converting a value of the output result to a modified value responsive to the determining that the output result is going to cause the floating point hardware exception, wherein the modified value eliminates the floating point hardware exception responsive to the floating point arithmetic operation on the input data.
    Type: Application
    Filed: March 4, 2022
    Publication date: June 16, 2022
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Publication number: 20220129270
    Abstract: A method includes receiving a TopK instruction to sort a highest K elements of a vector data. A first K elements of the vector data are sorted and stored in a first register. Another element of the vector data is read and determined whether it has a value that is greater than or is within a range of values of the first K elements. A position of the another element within the first K elements is determined if the another element has a value within that is within the range of values. A subset of the elements of the first K elements that are smaller than the another element are shifted down after determining the position of the another element in the first K elements. The another element is inserted in the determined position after the shifting. The process is repeated for each remaining element of the vector data.
    Type: Application
    Filed: October 21, 2021
    Publication date: April 28, 2022
    Inventors: Avinash Sodani, Ulf Hanebutte
  • Patent number: 11301247
    Abstract: A method includes receiving an input data at a FP arithmetic operating unit configured to perform a FP arithmetic operation on the input data. The method further includes determining whether the received input data generates a FP hardware exception responsive to the FP arithmetic operation on the input data, wherein the determining occurs prior to performing the FP arithmetic operation. The method also includes converting a value of the received input data to a modified value responsive to the determining that the received input data generates the FP hardware exception, wherein the converting eliminates generation of the FP hardware exception responsive to the FP arithmetic operation on the input data.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: April 12, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Patent number: 11256517
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: February 22, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Publication number: 20210342734
    Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
    Type: Application
    Filed: April 29, 2020
    Publication date: November 4, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20210248497
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
    Type: Application
    Filed: April 6, 2021
    Publication date: August 12, 2021
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Chia-Hsin CHEN
  • Patent number: 11086633
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: August 10, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Publication number: 20210209492
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.
    Type: Application
    Filed: December 23, 2020
    Publication date: July 8, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen