Patents by Inventor Tomasz Czajkowski
Tomasz Czajkowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230359695Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.Type: ApplicationFiled: July 17, 2023Publication date: November 9, 2023Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
-
Publication number: 20230064381Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.Type: ApplicationFiled: May 9, 2022Publication date: March 2, 2023Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
-
Patent number: 11328037Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.Type: GrantFiled: July 7, 2017Date of Patent: May 10, 2022Assignee: Intel CorporationInventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
-
Patent number: 11256836Abstract: Power dissipation in integrated circuits may be reduced by efficient implementation of high level programming on the integrated circuits. As the high level programming logic is implemented on the integrated circuits, data inputs are disabled based upon branches and/or data that is not used by the high level programming.Type: GrantFiled: December 14, 2017Date of Patent: February 22, 2022Assignee: Intel CorporationInventor: Tomasz Czajkowski
-
Patent number: 10599404Abstract: A method of compiling program code includes determining if the program code controls a programmable logic device to execute other program code. The program code is a parallel program having a barrier function call for a group of threads. If it is determined that program code is to control the programmable logic device, then the program code is transformed by replacing the barrier function call with control logic inserted into the program code such that the transformed program code remains a parallel program and maintains synchronization among the group of threads. A compiler system that compiles program code with a barrier function call for a group of threads is also described.Type: GrantFiled: June 1, 2012Date of Patent: March 24, 2020Assignee: Altera CorporationInventors: David Neto, Deshanand Singh, Tomasz Czajkowski, John Stuart Freeman, Tian Yi David Han
-
Publication number: 20190042673Abstract: Power dissipation in integrated circuits may be reduced by efficient implementation of high level programming on the integrated circuits. As the high level programming logic is implemented on the integrated circuits, data inputs are disabled based upon branches and/or data that is not used by the high level programming.Type: ApplicationFiled: December 14, 2017Publication date: February 7, 2019Inventor: Tomasz Czajkowski
-
Publication number: 20190012295Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.Type: ApplicationFiled: July 7, 2017Publication date: January 10, 2019Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
-
Patent number: 9904514Abstract: An integrated circuit may be provided with a specialized processing block that performs floating-point addition and subtraction operations. For this purpose, the specialized processing block includes a fused adder and subtractor stage with an adder circuit and a subtractor circuit. The adder and subtractor circuits share an alignment stage for aligning the mantissas of incoming floating-point numbers and provide a simplified normalization stage with one right shifter and one left shifter. The specialized processing blocks may be arranged in rows or columns such that an input of a first specialized processing block is directly coupled to an output of a second specialized processing block and an input of the second specialized processing block is directly coupled to an output of the first specialized processing block.Type: GrantFiled: October 6, 2015Date of Patent: February 27, 2018Assignee: Altera CorporationInventor: Tomasz Czajkowski
-
Patent number: 9639326Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.Type: GrantFiled: June 14, 2016Date of Patent: May 2, 2017Assignee: Altera CorporationInventor: Tomasz Czajkowski
-
Patent number: 9626218Abstract: Circuitry for dynamically ordering the execution of multiple threads in parallel is presented. The circuitry may include a control circuit that controls the execution of multiple subsets of threads using multiple processing units in parallel. Each of the plurality of processing units may be associated with an adjustable order thread issuer that may receive a subset of threads and an order in which to execute the subset of threads from the control circuit. The adjustable order thread issuer may manage the processing unit by providing each thread from the subset of threads for execution to the processing unit in the specified order. The adjustable order thread issuer may adjust the order in which threads are issued in an effort to optimize shared resource usage and thus improve the performance of a multithreaded application.Type: GrantFiled: March 10, 2014Date of Patent: April 18, 2017Assignee: Altera CorporationInventors: Dmitry Denisenko, Tomasz Czajkowski
-
Publication number: 20160291934Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.Type: ApplicationFiled: June 14, 2016Publication date: October 6, 2016Inventor: Tomasz Czajkowski
-
Patent number: 9430425Abstract: Systems and methods for resource sharing of pipelined circuitry of an integrated circuit (IC) are provided. For example, in one embodiment, a method for sharing a functional unit of an integrated circuit (IC) includes receiving two or more threads configured to access the functional unit through two or more data entry points associated with corresponding data exit points configured to receive processed thread data. The method further includes arbitrating the processing of the two or more threads by the functional unit to obtain the processed thread data. To arbitrate, the exit points that cannot receive additional data are determined. Threads are only received from data entry points with corresponding data exit points that can receive additional data. The processed output data is provided to a corresponding exit point.Type: GrantFiled: December 27, 2012Date of Patent: August 30, 2016Assignee: Altera CorporationInventor: Tomasz Czajkowski
-
Patent number: 9405728Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.Type: GrantFiled: September 5, 2013Date of Patent: August 2, 2016Assignee: Altera CorporationInventor: Tomasz Czajkowski
-
Patent number: 9135087Abstract: Systems and methods for limiting resource usage of a kernel of an integrated circuit are provided. For example, in one embodiment a method for limiting a number of workgroups that may simultaneously access a kernel of an integrated circuit (IC) includes determining a threshold number of workgroups that may access the kernel simultaneously. A thread of execution is received. The thread of execution is allowed to access the kernel when the threshold number of workgroups would not be exceeded by the thread of execution accessing the kernel.Type: GrantFiled: December 27, 2012Date of Patent: September 15, 2015Assignee: Altera CorporationInventors: Tomasz Czajkowski, John Freeman, Peter Yiannacouras
-
Publication number: 20150067010Abstract: An integrated circuit is provided that performs floating-point addition or subtraction operations involving at least three floating-point numbers. The floating-point numbers are pre-processed by dynamically extending the number of mantissa bits, determining the floating-point number with the biggest exponent, and shifting the mantissa of the other floating-point numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floating-point operation. The exact bit extension is dependent on the number of floating-point numbers to be added. The mantissas of all floating-point numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floating-point exponent.Type: ApplicationFiled: September 5, 2013Publication date: March 5, 2015Applicant: Altera CorporationInventor: Tomasz Czajkowski