Patents by Inventor Bruce Fleischer

Bruce Fleischer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
ULTRA-LOW PRECISION FLOATING-POINT FUSED MULTIPLY-ACCUMULATE UNIT

Publication number: 20200387351

Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

Type: Application

Filed: June 5, 2019

Publication date: December 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur AGRAWAL, Silvia MUELLER, Kailash GOPALAKRISHNAN, Bruce FLEISCHER, Balaram SINHAROY, Mingu KANG
FACILITATING DATA PROCESSING USING SIMD REDUCTION OPERATIONS ACROSS SIMD LANES

Publication number: 20200364056

Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

Type: Application

Filed: May 14, 2019

Publication date: November 19, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Kailash GOPALAKRISHNAN, Jinwook OH, Sunil SHUKLA, Silvia MUELLER
Programmable data delivery by load and store agents on a processing chip interfacing with on-chip memory components and directing data to external memory components

Patent number: 10838868

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Grant

Filed: March 7, 2019

Date of Patent: November 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Brian Curran, Bruce Fleischer, Kailash Gopalakrishan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan, Swagath Venkataramani
REUSING AN OPERAND IN AN INSTRUCTION SET ARCHITECTURE (ISA)

Publication number: 20200356371

Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.

Type: Application

Filed: May 8, 2019

Publication date: November 12, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Sunil SHUKLA, Vijayalakshmi SRINIVASAN, Jungwook CHOI
BINARY FLOATING-POINT MULTIPLY AND SCALE OPERATION FOR COMPUTE-INTENSIVE NUMERICAL APPLICATIONS AND APPARATUSES

Publication number: 20200310755

Abstract: Techniques facilitating binary floating-point multiply and scale operation for compute-intensive numerical applications and apparatuses are provided. An embodiment relates to a system that can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a receiver component that receives an instruction to perform a multiply and scale operation of the first floating point operand value, the second floating point operand value, and the integer operand value, wherein the multiplication component obtains the floating-point product in response to the instruction to perform the multiply and scale operation. The multiplication can be performed as a single instruction.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Silvia Melitta Mueller, Bruce Fleischer, Ankur Agrawal, Kailash Gopalakrishnan
INSTRUCTION INITIALIZATION IN A DATAFLOW ARCHITECTURE

Publication number: 20200304598

Abstract: Various embodiments are provided for implementing instruction initialization in a dataflow architecture in a computing environment. A data packet may be transmitted from a selected node to one or more of a plurality of nodes using one or more existing data paths in an initialization network. A determination operation is performed to determine whether one or more of a plurality of nodes is a target node intended for the data packet. Those of the plurality of nodes determined to be a target node initialize one or more components of the target node using the data packet. The data packet may be forwarded by each of the one or more of a plurality of nodes to a subsequent node in the initialization network.

Type: Application

Filed: March 19, 2019

Publication date: September 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHNAN, Sunil K SHUKLA
PROGRAMMABLE DATA DELIVERY TO A SYSTEM OF SHARED PROCESSING ELEMENTS WITH SHARED MEMORY

Publication number: 20200285579

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Application

Filed: March 7, 2019

Publication date: September 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu CHEN, Jungwook CHOI, Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHAN, Jinwook OH, Sunil K. SHUKLA, Vijayalakshmi SRINIVASAN, Swagath VENKATARAMANI
ENHANCED LOW PRECISION BINARY FLOATING-POINT FORMATTING

Publication number: 20200233642

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Application

Filed: April 6, 2020

Publication date: July 23, 2020

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
Enhanced low precision binary floating-point formatting

Patent number: 10656913

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Grant

Filed: June 5, 2018

Date of Patent: May 19, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Patent number: 10565285

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Grant

Filed: December 18, 2017

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
ENHANCED LOW PRECISION BINARY FLOATING-POINT FORMATTING

Publication number: 20190369960

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Application

Filed: June 5, 2018

Publication date: December 5, 2019

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
PROCESSOR AND MEMORY TRANSPARENT CONVOLUTIONAL LOWERING AND AUTO ZERO PADDING FOR DEEP NEURAL NETWORK IMPLEMENTATIONS

Publication number: 20190188240

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Application

Filed: December 18, 2017

Publication date: June 20, 2019

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
DYNAMIC FUSION BASED ON OPERAND SIZE

Publication number: 20190179639

Abstract: Aspects of the invention include receiving, by a processor, a plurality of instructions at an instruction pipeline. The processor can further determine an operand bit field size for each of the received plurality of instructions. The processor can further compare the operand bit field size of at least a subset of the received instructions to a predetermined threshold. The processor can further fuse at least two of the received instructions that have an operand bit field size that meets the predetermined threshold. The processor can further perform an execution stage within the instruction pipeline to execute the received instructions, including the fused instructions.

Type: Application

Filed: December 7, 2017

Publication date: June 13, 2019

Inventors: Maarten J. Boersma, Bruce Fleischer, Robert A. Philhower, Balaram Sinharoy
Zero detect in partial sums while adding

Publication number: 20060184603

Abstract: The present invention relates to a method and circuit for performing multiply-operations in an arithmetic unit of a computer processor. In a multiplier thereof, zero detection of the resulting product bit string (22) is needed for a proper setting of condition code and overflow status information. Zero detection according to prior art decreases the calculation speed in the multiplier. In order to provide a method and respective electronic circuit, wherein the zero detection is earlier completed, it is proposed to use a leading zero anticipation (LZA) hardware—i.e., an LZA circuit (40), which exists usually anyway in floating point processor adders for calculating the number of leading zeros for operand normalization purposes—for performing a zero detection of the product by aid of the partial results (16, 17) emerging at the output of the Wallace tree of the multiplier.

Type: Application

Filed: February 11, 2005

Publication date: August 17, 2006

Inventors: Son Trong, Mark Erle, Bruce Fleischer, Juergen Haess, Michael Kelly, Klaus Kroener, Martin Schmookler, Eric Schwarz
System and method for a fused multiply-add dataflow with early feedback prior to rounding

Publication number: 20060179096

Abstract: A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes computer instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.

Type: Application

Filed: February 10, 2005

Publication date: August 10, 2006

Applicant: International Business Machines Corporation

Inventors: Bruce Fleischer, Juergen Haess, Michael Kroener, Robert Montoye, Martin Schmookler, Eric Schwarz, Son Dao-Trong
System and method for performing decimal to binary conversion

Publication number: 20060179091

Abstract: A method for converting from binary to decimal. The method includes receiving a binary coded decimal (BCD) number made up of one or more sets of three digits. A running sum and a running carry are set to zero. The following steps are performed for each set of three digits in the BCD number in order from the set of three digits containing the three most significant digits of the BCD number to the set of three digits containing the three least significant digits of the BCD number. The steps include: creating six partial products based on the set of three digits, the running sum and the running carry; combining the six partial products into two partial products; and storing the two partial products in the running sum and the running carry. After the loop has been performed for each set of three digits in the BCD number, the running sum and the running carry are combined into a final binary result.

Type: Application

Filed: February 9, 2005

Publication date: August 10, 2006

Applicant: International Business Machines Corporation

Inventors: Steven Carlough, Bruce Fleischer, Wen Li, Eric Schwarz
System and method for a floating point unit with feedback prior to normalization and rounding

Publication number: 20060179097

Abstract: A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes a mechanism for performing a shift or masking operation in response to determining that the operand is in an un-normalized format. The system also includes instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.

Type: Application

Filed: February 9, 2005

Publication date: August 10, 2006

Applicant: International Business Machines Corporation

Inventors: Bruce Fleischer, Juergen Haess, Michael Kroener, Martin Schmookler, Eric Schwarz, Son Dao-Trong
System and method for converting binary to decimal

Publication number: 20060179090

Abstract: A method for converting from binary to decimal. The method includes receiving a binary number, the binary number including one or more sets of bits. An accumulated sum is set to zero. The accumulated sum is in a binary coded decimal (BCD) format. The following loop is repeated for each set of bits in the binary number in order from the set of bits containing the most significant bit of the binary number to the set of bits containing the least significant bit of the binary number: the accumulated sum is converted into a 5,1 code format resulting in an interim sum. The loop also includes repeating for each next bit in the set in order from the most significant bit to the least significant bit in the set: doubling the interim sum; and replacing the least significant bit of the interim sum with the next bit. The last step in the loop includes converting the interim sum into the BCD format and storing the results of the converting in the accumulated sum.

Type: Application

Filed: February 9, 2005

Publication date: August 10, 2006

Applicant: International Business Machines Corporation

Inventors: Steven Carlough, Bruce Fleischer, Wen Li, Eric Schwarz

prev 1 2