Patents by Inventor Ulf Hanebutte

Ulf Hanebutte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Architecture to support tanh and sigmoid operations for inference acceleration in machine learning

Patent number: 11966857

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

Type: Grant

Filed: April 6, 2021

Date of Patent: April 23, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
Architecture to support tanh and sigmoid operations for inference acceleration in machine learning

Patent number: 11934965

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

Type: Grant

Filed: April 6, 2021

Date of Patent: March 19, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
Method and apparatus for generating metadata by a compiler

Patent number: 11733983

Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.

Type: Grant

Filed: September 8, 2022

Date of Patent: August 22, 2023

Assignee: Marvell Asia Pte Ltd

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
SYSTEM AND METHOD FOR INT9 QUANTIZATION

Publication number: 20230096994

Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.

Type: Application

Filed: December 6, 2022

Publication date: March 30, 2023

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
METHOD AND APPARATUS FOR GENERATING METADATA BY A COMPILER

Publication number: 20230015688

Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.

Type: Application

Filed: September 8, 2022

Publication date: January 19, 2023

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
System and method for INT9 quantization

Patent number: 11551148

Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.

Type: Grant

Filed: April 29, 2020

Date of Patent: January 10, 2023

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
MULTISTAGE COMPILER ARCHITECTURE

Publication number: 20230004365

Abstract: A system includes a compiler including a plurality of compiler blocks. The compiler blocks of the plurality of compiler blocks are compossible. The compiler is configured to identify one or more resources in a hardware to execute a set of low-level instructions that is generated from a high-level function in a high-level code. The compiler is further configured to determine one or more processing operations to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The compiler is configured to compile the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware.

Type: Application

Filed: March 2, 2022

Publication date: January 5, 2023

Inventors: Ulf Hanebutte, Senad Durakovic, Chien-Chun Chou, Fu-Hwa Wang, Mohana Tandyala
Architecture for table-based mathematical operations for inference acceleration in machine learning

Patent number: 11494676

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.

Type: Grant

Filed: December 23, 2020

Date of Patent: November 8, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
Method and apparatus for generating metadata by a compiler

Patent number: 11467811

Abstract: A method includes receiving a high-level function in a high-level code of an application is received. The method also include identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code. One or more processing operations are determined to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The high-level function in the high-level code of the application is compiled into the set of low-level instructions to be executed on the hardware. A plurality of structured metadata is generated and includes information associated with the determining resources in the hardware and further includes information associated with the determining one or more processing operations.

Type: Grant

Filed: July 30, 2021

Date of Patent: October 11, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

Publication number: 20220188109

Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.

Type: Application

Filed: March 4, 2022

Publication date: June 16, 2022

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

Publication number: 20220188111

Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method also includes determining whether the received input data is positive infinity or negative infinity prior to performing the floating point arithmetic operation. The method further includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is positive infinity or negative infinity.

Type: Application

Filed: March 4, 2022

Publication date: June 16, 2022

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

Publication number: 20220188110

Abstract: A method includes receiving a first input data and a second input data at a floating point arithmetic operating unit, wherein the first input data and the second input data are associated with operands of a floating point arithmetic operation respectively, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the first input data and the second input data. The method further includes determining whether the first input data is a qnan (quiet not-a-number) or whether the first input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. A value of the first input data is modified prior to performing the floating point arithmetic operation if the first input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the first input data being either qnan or snan.

Type: Application

Filed: March 4, 2022

Publication date: June 16, 2022

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

Publication number: 20220188108

Abstract: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data to generate an output result. The method also includes determining whether the output result is going to cause a floating point hardware exception responsive to the floating point arithmetic operation on the input data. The method further includes converting a value of the output result to a modified value responsive to the determining that the output result is going to cause the floating point hardware exception, wherein the modified value eliminates the floating point hardware exception responsive to the floating point arithmetic operation on the input data.

Type: Application

Filed: March 4, 2022

Publication date: June 16, 2022

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
METHOD AND SYSTEM FOR TOPK OPERATION

Publication number: 20220129270

Abstract: A method includes receiving a TopK instruction to sort a highest K elements of a vector data. A first K elements of the vector data are sorted and stored in a first register. Another element of the vector data is read and determined whether it has a value that is greater than or is within a range of values of the first K elements. A position of the another element within the first K elements is determined if the another element has a value within that is within the range of values. A subset of the elements of the first K elements that are smaller than the another element are shifted down after determining the position of the another element in the first K elements. The another element is inserted in the determined position after the shifting. The process is repeated for each remaining element of the vector data.

Type: Application

Filed: October 21, 2021

Publication date: April 28, 2022

Inventors: Avinash Sodani, Ulf Hanebutte
System and method for handling floating point hardware exception

Patent number: 11301247

Abstract: A method includes receiving an input data at a FP arithmetic operating unit configured to perform a FP arithmetic operation on the input data. The method further includes determining whether the received input data generates a FP hardware exception responsive to the FP arithmetic operation on the input data, wherein the determining occurs prior to performing the FP arithmetic operation. The method also includes converting a value of the received input data to a modified value responsive to the determining that the received input data generates the FP hardware exception, wherein the converting eliminates generation of the FP hardware exception responsive to the FP arithmetic operation on the input data.

Type: Grant

Filed: April 30, 2020

Date of Patent: April 12, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
Architecture of crossbar of inference engine

Patent number: 11256517

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: December 19, 2018

Date of Patent: February 22, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
SYSTEM AND METHOD FOR INT9 QUANTIZATION

Publication number: 20210342734

Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.

Type: Application

Filed: April 29, 2020

Publication date: November 4, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
ARCHITECTURE TO SUPPORT TANH AND SIGMOID OPERATIONS FOR INFERENCE ACCELERATION IN MACHINE LEARNING

Publication number: 20210248497

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

Type: Application

Filed: April 6, 2021

Publication date: August 12, 2021

Inventors: Avinash SODANI, Ulf HANEBUTTE, Chia-Hsin CHEN
Single instruction set architecture (ISA) format for multiple ISAS in machine learning inference engine

Patent number: 11086633

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: December 19, 2018

Date of Patent: August 10, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
ARCHITECTURE FOR TABLE-BASED MATHEMATICAL OPERATIONS FOR INFERENCE ACCELERATION IN MACHINE LEARNING

Publication number: 20210209492

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.

Type: Application

Filed: December 23, 2020

Publication date: July 8, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen

1 2 next