Patents by Inventor Jungwook CHOI

Jungwook CHOI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MIXED PRECISION CAPABLE HARDWARE FOR TUNING A MACHINE LEARNING MODEL

Publication number: 20210064372

Abstract: An apparatus includes a memory and a processor coupled to the memory. The processor includes first and second sets of arithmetic units having first and second precision for floating-point computations, the second precision being lower than the first precision. The processor is configured to obtain a machine learning model trained in the first precision, to utilize the second set of arithmetic units to perform inference on input data, to utilize the first set of arithmetic units to generate feedback for updating parameters of the second set of arithmetic units based on the inference performed on the input data by the second set of arithmetic units, to tune parameters of the second set of arithmetic units based at least in part on the feedback generated by the first set of arithmetic units, and to utilize the second set of arithmetic units with the tuned parameters to generate inference results.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Xiao Sun, Chia-Yu Chen, Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan
FORMATION FAILURE RESILIENT NEUROMORPHIC DEVICE

Publication number: 20210064974

Abstract: A neuromorphic device includes a plurality of first control lines, a plurality of second control lines and a matrix of resistive processing unit cells. Each resistive processing unit cell is electrically connected with one of the first control lines and one of the second control lines. A given resistive processing unit cell includes a first resistive device and a second resistive device. The first resistive device is a positively weighted resistive device and the second resistive device is a negatively weighted resistive device.

Type: Application

Filed: August 30, 2019

Publication date: March 4, 2021

Inventors: Youngseok Kim, Jungwook Choi, Seyoung Kim, Chun-Chen Yeh
SYSTEM-AWARE SELECTIVE QUANTIZATION FOR PERFORMANCE OPTIMIZED DISTRIBUTED DEEP LEARNING

Publication number: 20210064954

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
MACHINE LEARNING HARDWARE HAVING REDUCED PRECISION PARAMETER COMPONENTS FOR EFFICIENT PARAMETER UPDATE

Publication number: 20210064985

Abstract: An apparatus for training and inferencing a neural network includes circuitry that is configured to generate a first weight having a first format including a first number of bits based at least in part on a second weight having a second format including a second number of bits and a residual having a third format including a third number of bits. The second number of bits and the third number of bits are each less than the first number of bits. The circuitry is further configured to update the second weight based at least in part on the first weight and to update the residual based at least in part on the updated second weight and the first weight. The circuitry is further configured to update the first weight based at least in part on the updated second weight and the updated residual.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Xiao Sun, Jungwook Choi, Naigang Wang, Chia-Yu Chen, Kailash Gopalakrishnan
REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
Programmable data delivery by load and store agents on a processing chip interfacing with on-chip memory components and directing data to external memory components

Patent number: 10838868

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Grant

Filed: March 7, 2019

Date of Patent: November 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Brian Curran, Bruce Fleischer, Kailash Gopalakrishan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan, Swagath Venkataramani
REUSING AN OPERAND IN AN INSTRUCTION SET ARCHITECTURE (ISA)

Publication number: 20200356371

Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.

Type: Application

Filed: May 8, 2019

Publication date: November 12, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Sunil SHUKLA, Vijayalakshmi SRINIVASAN, Jungwook CHOI
DYNAMICALLY RESIZING MINIBATCH IN NEURAL NETWORK EXECUTION

Publication number: 20200311536

Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
PROGRAMMABLE DATA DELIVERY TO A SYSTEM OF SHARED PROCESSING ELEMENTS WITH SHARED MEMORY

Publication number: 20200285579

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Application

Filed: March 7, 2019

Publication date: September 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu CHEN, Jungwook CHOI, Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHAN, Jinwook OH, Sunil K. SHUKLA, Vijayalakshmi SRINIVASAN, Swagath VENKATARAMANI
Matrix multiplication on a systolic array

Patent number: 10769238

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: September 19, 2019

Date of Patent: September 8, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
HYBRID FLOATING POINT REPRESENTATION FOR DEEP LEARNING ACCELERATION

Publication number: 20200249910

Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.

Type: Application

Filed: February 6, 2019

Publication date: August 6, 2020

Applicant: International Business Machines Corporation

Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
ADVERSARIAL INPUT IDENTIFICATION USING REDUCED PRECISION DEEP NEURAL NETWORKS

Publication number: 20200226459

Abstract: A processor receives input data and provides the input data to a first neural network including a first neural network model. The first neural network model has a first numerical precision level. A first feature vector is generated from the input data using the first neural network. The input data is provided to a second neural network including a second neural network model. The second neural network model has a second numerical precision level different from the first numerical precession level. A second feature vector is generated from the input data using the second neural network. A difference metric is computed between the first feature vector and the second feature vector. The difference metric is indicative of whether the input data includes adversarial data.

Type: Application

Filed: January 11, 2019

Publication date: July 16, 2020

Applicant: International Business Machines Corporation

Inventors: Chia-Yu Chen, Pin-Yu Chen, Pierce I-Jen Chuang, Richard Chen, Jungwook Choi, Kailash Gopalakrishnan
Deep learning accelerator architecture with chunking GEMM

Patent number: 10657442

Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.

Type: Grant

Filed: April 19, 2018

Date of Patent: May 19, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
Method to Map Convolutional Layers of Deep Neural Network on a Plurality of Processing Elements with SIMD Execution Units, Private Memories, and Connected as a 2D Systolic Processor Array

Publication number: 20200134105

Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.

Type: Application

Filed: October 31, 2018

Publication date: April 30, 2020

Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
Very low precision floating point representation for deep learning acceleration

Patent number: 10592208

Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.

Type: Grant

Filed: May 7, 2018

Date of Patent: March 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Naigang Wang, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Patent number: 10565285

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Grant

Filed: December 18, 2017

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20200012706

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: September 19, 2019

Publication date: January 9, 2020

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
LOW PRECISION DEEP NEURAL NETWORK ENABLED BY COMPENSATION INSTRUCTIONS

Publication number: 20200005125

Abstract: A compensated deep neural network (compensated-DNN) is provided. A first vector having a set of components and a second vector having a set of corresponding components are received. A component of the first vector includes a first quantized value and a first compensation instruction, and a corresponding component of the second vector includes a second quantized value and a second compensation instruction. The first quantized value is multiplied with the second quantized value to compute a raw product value. The raw product value is compensated for a quantization error according to the first and second compensation instructions to produce a compensated product value. The compensated product value is added into an accumulated value for the dot product. The accumulated value is converted into an output vector of the dot product. The output vector includes an output quantized value and an output compensation instruction.

Type: Application

Filed: June 27, 2018

Publication date: January 2, 2020

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Jungwook Choi, Leland Chang
STATISTICS-AWARE WEIGHT QUANTIZATION

Publication number: 20190385050

Abstract: Techniques for statistics-aware weight quantization are presented. To facilitate reducing the bit precision of weights, for a set of weights, a quantizer management component can estimate a quantization scale value to apply to a weight as a linear or non-linear function of the mean of a square of a weight value of the weight and the mean of an absolute value of the weight value, wherein the quantization scale value is determined to have a smaller quantization error than all, or at least almost all, other quantization errors associated with other quantization scale values. A quantizer component applies the quantization scale value to symmetrically and/or uniformly quantize weights of a layer of the set of weights to generate quantized weights, the weights being quantized using rounding. The respective quantized weights can be used to facilitate training and inference of a deep learning system.

Type: Application

Filed: June 13, 2018

Publication date: December 19, 2019

Inventors: Zhuo Wang, Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang
Matrix multiplication on a systolic array

Patent number: 10489484

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: April 11, 2019

Date of Patent: November 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang

prev 1 2 3 next