Patents by Inventor Tomasz Czajkowski

Tomasz Czajkowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230359695

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: July 17, 2023

Publication date: November 9, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230064381

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: May 9, 2022

Publication date: March 2, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
Memory-size- and bandwidth-efficient method for feeding systolic array matrix multipliers

Patent number: 11328037

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Grant

Filed: July 7, 2017

Date of Patent: May 10, 2022

Assignee: Intel Corporation

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
Toggle rate reduction in high level programming implementations

Patent number: 11256836

Abstract: Power dissipation in integrated circuits may be reduced by efficient implementation of high level programming on the integrated circuits. As the high level programming logic is implemented on the integrated circuits, data inputs are disabled based upon branches and/or data that is not used by the high level programming.

Type: Grant

Filed: December 14, 2017

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventor: Tomasz Czajkowski
M/A for compiling parallel program having barrier synchronization for programmable hardware

Patent number: 10599404

Abstract: A method of compiling program code includes determining if the program code controls a programmable logic device to execute other program code. The program code is a parallel program having a barrier function call for a group of threads. If it is determined that program code is to control the programmable logic device, then the program code is transformed by replacing the barrier function call with control logic inserted into the program code such that the transformed program code remains a parallel program and maintains synchronization among the group of threads. A compiler system that compiles program code with a barrier function call for a group of threads is also described.

Type: Grant

Filed: June 1, 2012

Date of Patent: March 24, 2020

Assignee: Altera Corporation

Inventors: David Neto, Deshanand Singh, Tomasz Czajkowski, John Stuart Freeman, Tian Yi David Han
TOGGLE RATE REDUCTION IN HIGH LEVEL PROGRAMMING IMPLEMENTATIONS

Publication number: 20190042673

Abstract: Power dissipation in integrated circuits may be reduced by efficient implementation of high level programming on the integrated circuits. As the high level programming logic is implemented on the integrated circuits, data inputs are disabled based upon branches and/or data that is not used by the high level programming.

Type: Application

Filed: December 14, 2017

Publication date: February 7, 2019

Inventor: Tomasz Czajkowski
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20190012295

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: July 7, 2017

Publication date: January 10, 2019

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
Fused floating-point arithmetic circuitry

Patent number: 9904514

Abstract: An integrated circuit may be provided with a specialized processing block that performs floating-point addition and subtraction operations. For this purpose, the specialized processing block includes a fused adder and subtractor stage with an adder circuit and a subtractor circuit. The adder and subtractor circuits share an alignment stage for aligning the mantissas of incoming floating-point numbers and provide a simplified normalization stage with one right shifter and one left shifter. The specialized processing blocks may be arranged in rows or columns such that an input of a first specialized processing block is directly coupled to an output of a second specialized processing block and an input of the second specialized processing block is directly coupled to an output of the first specialized processing block.

Type: Grant

Filed: October 6, 2015

Date of Patent: February 27, 2018

Assignee: Altera Corporation

Inventor: Tomasz Czajkowski
Floating-point adder circuitry

Patent number: 9639326

Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.

Type: Grant

Filed: June 14, 2016

Date of Patent: May 2, 2017

Assignee: Altera Corporation

Inventor: Tomasz Czajkowski
Repartitioning and reordering of multiple threads into subsets based on possible access conflict, for sequential access to groups of memory banks in a shared memory

Patent number: 9626218

Abstract: Circuitry for dynamically ordering the execution of multiple threads in parallel is presented. The circuitry may include a control circuit that controls the execution of multiple subsets of threads using multiple processing units in parallel. Each of the plurality of processing units may be associated with an adjustable order thread issuer that may receive a subset of threads and an order in which to execute the subset of threads from the control circuit. The adjustable order thread issuer may manage the processing unit by providing each thread from the subset of threads for execution to the processing unit in the specified order. The adjustable order thread issuer may adjust the order in which threads are issued in an effort to optimize shared resource usage and thus improve the performance of a multithreaded application.

Type: Grant

Filed: March 10, 2014

Date of Patent: April 18, 2017

Assignee: Altera Corporation

Inventors: Dmitry Denisenko, Tomasz Czajkowski
FLOATING-POINT ADDER CIRCUITRY

Publication number: 20160291934

Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.

Type: Application

Filed: June 14, 2016

Publication date: October 6, 2016

Inventor: Tomasz Czajkowski
Multi-cycle resource sharing

Patent number: 9430425

Abstract: Systems and methods for resource sharing of pipelined circuitry of an integrated circuit (IC) are provided. For example, in one embodiment, a method for sharing a functional unit of an integrated circuit (IC) includes receiving two or more threads configured to access the functional unit through two or more data entry points associated with corresponding data exit points configured to receive processed thread data. The method further includes arbitrating the processing of the two or more threads by the functional unit to obtain the processed thread data. To arbitrate, the exit points that cannot receive additional data are determined. Threads are only received from data entry points with corresponding data exit points that can receive additional data. The processed output data is provided to a corresponding exit point.

Type: Grant

Filed: December 27, 2012

Date of Patent: August 30, 2016

Assignee: Altera Corporation

Inventor: Tomasz Czajkowski
Floating-point adder circuitry

Patent number: 9405728

Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.

Type: Grant

Filed: September 5, 2013

Date of Patent: August 2, 2016

Assignee: Altera Corporation

Inventor: Tomasz Czajkowski
Workgroup handling in pipelined circuits

Patent number: 9135087

Abstract: Systems and methods for limiting resource usage of a kernel of an integrated circuit are provided. For example, in one embodiment a method for limiting a number of workgroups that may simultaneously access a kernel of an integrated circuit (IC) includes determining a threshold number of workgroups that may access the kernel simultaneously. A thread of execution is received. The thread of execution is allowed to access the kernel when the threshold number of workgroups would not be exceeded by the thread of execution accessing the kernel.

Type: Grant

Filed: December 27, 2012

Date of Patent: September 15, 2015

Assignee: Altera Corporation

Inventors: Tomasz Czajkowski, John Freeman, Peter Yiannacouras
FLOATING-POINT ADDER CIRCUITRY

Publication number: 20150067010

Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.

Type: Application

Filed: September 5, 2013

Publication date: March 5, 2015

Applicant: Altera Corporation

Inventor: Tomasz Czajkowski