Patents by Inventor Silvia Mueller

Silvia Mueller has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Ultra-low precision floating-point fused multiply-accumulate unit

Patent number: 11455142

Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

Type: Grant

Filed: June 5, 2019

Date of Patent: September 27, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur Agrawal, Silvia Mueller, Kailash Gopalakrishnan, Bruce Fleischer, Balaram Sinharoy, Mingu Kang
Reduced precision based programmable and SIMD dataflow architecture

Patent number: 11347517

Abstract: A reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture includes reduced precision execution units with a majority of the execution units operating at reduced precision and a minority of the execution units are capable of operating at higher precision. The execution units operate in parallel within a programmable execution element to share instruction fetch, decode, and issue pipelines and operate on the same instruction in lock-step to minimize instruction-related overhead.

Type: Grant

Filed: June 20, 2019

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash Gopalakrishnan, Sunil Shukla, Jungwook Choi, Silvia Mueller, Bruce Fleischer, Vijayalakshmi Srinivasan, Ankur Agrawal, Jinwook Oh
Facilitating data processing using SIMD reduction operations across SIMD lanes

Patent number: 11216281

Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

Type: Grant

Filed: May 14, 2019

Date of Patent: January 4, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil Shukla, Silvia Mueller
REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
ULTRA-LOW PRECISION FLOATING-POINT FUSED MULTIPLY-ACCUMULATE UNIT

Publication number: 20200387351

Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

Type: Application

Filed: June 5, 2019

Publication date: December 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur AGRAWAL, Silvia MUELLER, Kailash GOPALAKRISHNAN, Bruce FLEISCHER, Balaram SINHAROY, Mingu KANG
FACILITATING DATA PROCESSING USING SIMD REDUCTION OPERATIONS ACROSS SIMD LANES

Publication number: 20200364056

Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

Type: Application

Filed: May 14, 2019

Publication date: November 19, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Kailash GOPALAKRISHNAN, Jinwook OH, Sunil SHUKLA, Silvia MUELLER
Byte Execution Unit for Carrying Out Byte Instructions in a Processor

Publication number: 20070061553

Abstract: A disclosed byte execution unit receives byte instruction information and two operands, and performs an operation specified by the byte instruction information upon one or both of the operands, thereby producing a result. The byte instruction specifies either a count ones in bytes operation, an average bytes operation, an absolute differences of bytes operation, or a sum bytes into halfwords operation. In one embodiment, the byte execution unit includes multiple byte units. Each byte unit includes multiple population counters, two compressor units, adder input multiplexer logic, adder logic, and result multiplexer logic. A data processing system is described including a processor coupled to a memory system. The processor includes the byte execution unit. The memory system includes a byte instruction, wherein the byte instruction specifies either the count ones in bytes operation, the average bytes operation, the absolute differences of bytes operation, or the sum bytes into halfwords operation.

Type: Application

Filed: November 1, 2006

Publication date: March 15, 2007

Inventors: Sang Dhong, Hwa-Joon Oh, Brad Michael, Silvia Mueller, Kevin Tran
Leading-Zero Counter and Method to Count Leading Zeros

Publication number: 20070050435

Abstract: The present invention relates to a circuit comprising a Leading Zero Counter (LZC) sub-circuit driving a second sub-circuit, like a shifter or arbiter. Shifter circuits or arbiter circuits operating with fewer stages than before have a smaller delay since every stage can select between more than two inputs. This reduces the overall delay of the shifter, arbiter, etc. But for state-of-the art binary LZC circuits this requires a complex recoding between LZC and shifter circuit. In order to provide an improved leading zero circuit having an output which allows a simpler control of a post-connected sub-circuit having two or more stages and having at least one stage with three or more inputs, it is proposed to provide a LZC circuitry providing an output consisting of two or more unary encoded substrings. This removes the requirement for a recoder between LZC and shifter.

Type: Application

Filed: July 25, 2006

Publication date: March 1, 2007

Inventors: Christian Jacobi, Silvia Mueller, Jochen Preiss, Kai Weber
Method and Processor for Performing a Floating-Point Instruction Within a Processor

Publication number: 20070038693

Abstract: The invention relates to a method for performing floating-point instructions within a processor of a data processing system is described, wherein an input of said floating-point instruction comprises a normal or a denormal floating-point number. Said method comprises the steps of storing said floating-point number, normalization of said floating-point number by counting the leading zeros of the mantissa, shifting the fraction part to the left by the number of leading zeros and simultaneously decrementing the exponent by one for every position that the fraction part is shifted to the left, wherein it the input is a normal floating point number the normalization is done after counting no leading zero of the mantissa, execution of a floating point instruction, wherein said normalized floating-point number is utilized as input for the floating point instruction, and storing of a floating-point result. Furthermore a processor to be used to perform said method is described.

Type: Application

Filed: August 3, 2006

Publication date: February 15, 2007

Inventors: Christian Jacobi, Matthias Klein, Silvia Mueller, Matthias Pflanz, Jochen Preiss
Processor having efficient function estimate instructions

Publication number: 20060259745

Abstract: A preferred embodiment of the present invention provides a method, computer program product, and processor design for supporting high-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, a preferred embodiment of the present invention incurs significantly less overhead than would specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.

Type: Application

Filed: May 12, 2005

Publication date: November 16, 2006

Inventors: Sang Dhong, Gordon Fossum, Harm Hofstee, Brad Michael, Silvia Mueller, Hwa-Joon Oh
Floating point unit with fused multiply add and method for calculating a result with a floating point unit

Publication number: 20060184601

Abstract: The invention proposes a Floating Point Unit (1) with fused multiply add, with one addend operand (eb, fb) and two multiplicand operands (ea, fa; ec, fc), with a shift amount logic (2) which based on the exponents of the operands (ea, eb and ec) computes an alignment shift amount, with an alignment logic (3) which uses the alignment shift amount to align the fraction (fb) of the addend operand, with a multiply logic (4) which multiplies the fractions of the multiplicand operands (fa, fc), with a adder logic (5) which adds the outputs of the alignment logic (3) and the multiply logic (4), with a normalization logic (6) which normalizes the output of the adder logic (5), which is characterized in that a leading zero logic (7) is provided which computes the number of leading zeros of the fraction of the addend operand (fb), and that a compare logic (8) is provided which based on the number of leading zeros and the alignment shift amount computes select signals that indicate whether the most significant bits of t

Type: Application

Filed: February 11, 2005

Publication date: August 17, 2006

Inventors: Son Trong, Juergen Haess, Christian Jacobi, Klaus Kroener, Silvia Mueller, Jochen Preiss
Leakage current reduction system and method

Publication number: 20060101315

Abstract: An apparatus, a method and a computer program are provided to reduce leakage current in a processor. Traditionally, extra logic is employed to reduce leakage currents. However, reducing leakage current without sacrificing fine grain operations and speed can be difficult. Achieving such a goal can be accomplished by incorporating a multiplexer (mux) into the scan-in path of scan registers so that units or sub-units of the processor can be powered down individually. Additionally, the muxes are not incorporated into time paths, so speed can be preserved.

Type: Application

Filed: November 5, 2004

Publication date: May 11, 2006

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Hwa-Joon Oh, Silvia Mueller, Joel Silberman
Using a leading-sign anticipator circuit for detecting sticky-bit information

Publication number: 20060101108

Abstract: A method, an apparatus, and a computer program are provided to more efficiently generate a sticky bit in a Floating Point Design. Traditionally, separate ORing logic or OR trees were employed to compress the stick outputs of a normalization shifter into at least one sticky bit. However, this design has power consumption and area costs associated with it. To overcome these disadvantages, the OR trees of Leading Zero Counters (CLZs) are employed in conjunction with the Edge Vector logic of a Leading Sign Anticipator and an additional OR gate to determine the sticky bit.

Type: Application

Filed: November 5, 2004

Publication date: May 11, 2006

Applicants: International Business Machines Corporation, Sony Computer Entertainment Inc.

Inventors: Sang Dhong, Christian Jacobi, Silvia Mueller, Hwa-Joon Oh, Yonetaro Totsuka
Apparatus for controlling rounding modes in single instruction multiple data (SIMD) floating-point units

Publication number: 20060101107

Abstract: An apparatus for controlling rounding modes in a single instruction multiple data (SIMD) floating-point unit is disclosed. The SIMD floating-point unit includes a floating-point status-and-control register (FPSCR) having a first rounding mode bit field and a second rounding mode bit field. The SIMD floating-point unit also includes means for generating a first slice and a second slice. During a floating-point operation, the SIMD floating-point unit concurrently performs a first rounding operation on the first slice and a second rounding operation on the second slice according to a bit in the first rounding mode bit field and a bit in the second rounding mode bit field within the FPSCR, respectively.

Type: Application

Filed: November 5, 2004

Publication date: May 11, 2006

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Harm Hofstee, Christian Jacobi, Silvia Mueller, Hwa-Joon Oh
Alignment shifter supporting multiple precisions

Publication number: 20060031272

Abstract: An apparatus, a method, and a computer program are provided for fully utilizing a double precision Floating Point (FP) alignment shifter. In conventional FP adders, and other FP computational units, double precision FP alignment shifters are utilized to perform both double and single precision alignment shifts. However, when a conventional double precision FP alignment shifter is utilized for a single precision calculation, half of the available capacity of the double precision FP alignment shifter is wasted. Therefore, to better utilize the capacity of double precision FP alignment shifter, a modified alignment shifter is utilized that can perform either an alignment shift for a double precision calculation or two simultaneous (or nearly simultaneous) alignment shifts for two single precision calculations.

Type: Application

Filed: August 5, 2004

Publication date: February 9, 2006

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Hwa-Joon Oh, Silvia Mueller
Apparatus and method for reducing the latency of sum-addressed shifters

Publication number: 20060026223

Abstract: The present invention provides for calculating a shift amount as a function of a plurality of numbers. At least one decoder and the at least one adder are coupled in parallel. A shifter is configured to compute a value in a plurality of shift stages, and wherein a bit group of the shift amount is employable to affect at least one of the plurality of shift stages, thereby decreasing processing time.

Type: Application

Filed: July 29, 2004

Publication date: February 2, 2006

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Christian Jacobi, Silvia Mueller, Hiroo Nishikawa, Hwa-Joon Oh
Protecting one-hot logic against short-curcuits during power-on

Publication number: 20060012399

Abstract: A method, a computer program, and an apparatus are provided to protect transmission gates in a multiplexer (mux). Because transmission gates are much faster than the more convention AND-OR arrays, transmission gate usage in muxes are being used more often in high speed circuitry. However, transmission gate have a significant problem in that short circuit are possible for situations where there is not a one-hot select signal. Therefore, to eliminate the problem, logic gates are utilized specifically during Power-On Reset (POR) to force a one-hot selection to prevent any possible short circuits.

Type: Application

Filed: July 15, 2004

Publication date: January 19, 2006

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Christian Jacobi, Hwa-Joon Oh, Silvia Mueller
Fast operand formatting for a high performance multiply-add floating point-unit

Publication number: 20050228844

Abstract: Disclosed are a floating point execution unit, and a method of operating a floating point unit, to perform multiply/add operations using a plurality of operands from an instruction having a plurality of operand positions. The floating point unit comprises a multiplier for calculating a product of two of the operands, and an aligner for combining said product and a third of the operands. A first data path is used to supply to the multiplier operands from a first and a second of the operand positions of the instruction, and a second data path is used to supply the third operand to the aligner. The floating point unit further comprises a multiplexer on the second data path for selecting, for use by the aligner, either the operand from the second operand position or the operand from the third operand position of the instruction.

Type: Application

Filed: April 8, 2004

Publication date: October 13, 2005

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sang Dhong, Silvia Mueller, Hiroo Nishikawa, Hwa-Joon Oh
High speed adder design for a multiply-add based floating point unit

Publication number: 20050131981

Abstract: An apparatus and computer program product are provided for improving a high-speed adder for Floating-Point Units (FPU) in a given computer system. The improved adder utilizes a compound incrementer, a compound adder, a carry network, an adder control/selector, and series of multiplexers (muxes). The carry network performs the end-around-carry function simultaneously to and independent of other required functions optimizing the functioning of the adder. Also, the use of a minimum number of muxes is also utilized to reduce mux delays.

Type: Application

Filed: December 11, 2003

Publication date: June 16, 2005

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Silvia Mueller, Hwa-Joon Oh
High performance implementation of exponent adjustment in a floating point design

Publication number: 20050114422

Abstract: A floating point unit (FPU) which generates a correction signal and an inverted leading zero signal. Exponent logic, is configured to generate an exponent value, a first incremented exponent value, and a second incremented exponent value. Exponent adjust and rounding logic configured to receive the exponent value, the first incremented exponent value, and the second incremented exponent value. The exponent adjust and rounding logic is further configured to add the inverted leading zero signal to the first incremented exponent value and the second incremented exponent value, thereby producing an exponent output value, a first incremented exponent output value, and a second incremented exponent output value. Either the exponent output value, the first incremented exponent output value, or the second exponent output value are then selected.

Type: Application

Filed: November 20, 2003

Publication date: May 26, 2005

Applicant: International Business Machines Corporation

Inventors: Sang Dhong, Silvia Mueller, Hwa-Joon Oh, Kevin Tran

1 2 next