Patents by Inventor Jungwook CHOI

Jungwook CHOI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210064976
    Abstract: An apparatus includes circuitry for a neural network that is configured to perform forward propagation neural network operations on floating point numbers having a first n-bit floating point format. The first n-bit floating point format has a configuration consisting of a sign bit, m exponent bits and p mantissa bits where m is greater than p. The circuitry is further configured to perform backward propagation neural network operations on floating point numbers having a second n-bit floating point format that is different than the first n-bit floating point format. The second n-bit floating point format has a configuration consisting of a sign bit, q exponent bits and r mantissa bits where q is greater than m and r is less than p.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Xiao Sun, Jungwook Choi, Naigang Wang, Chia-Yu Chen, Kailash Gopalakrishnan
  • Publication number: 20210064985
    Abstract: An apparatus for training and inferencing a neural network includes circuitry that is configured to generate a first weight having a first format including a first number of bits based at least in part on a second weight having a second format including a second number of bits and a residual having a third format including a third number of bits. The second number of bits and the third number of bits are each less than the first number of bits. The circuitry is further configured to update the second weight based at least in part on the first weight and to update the residual based at least in part on the updated second weight and the first weight. The circuitry is further configured to update the first weight based at least in part on the updated second weight and the updated residual.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Xiao Sun, Jungwook Choi, Naigang Wang, Chia-Yu Chen, Kailash Gopalakrishnan
  • Publication number: 20210064974
    Abstract: A neuromorphic device includes a plurality of first control lines, a plurality of second control lines and a matrix of resistive processing unit cells. Each resistive processing unit cell is electrically connected with one of the first control lines and one of the second control lines. A given resistive processing unit cell includes a first resistive device and a second resistive device. The first resistive device is a positively weighted resistive device and the second resistive device is a negatively weighted resistive device.
    Type: Application
    Filed: August 30, 2019
    Publication date: March 4, 2021
    Inventors: Youngseok Kim, Jungwook Choi, Seyoung Kim, Chun-Chen Yeh
  • Publication number: 20210064372
    Abstract: An apparatus includes a memory and a processor coupled to the memory. The processor includes first and second sets of arithmetic units having first and second precision for floating-point computations, the second precision being lower than the first precision. The processor is configured to obtain a machine learning model trained in the first precision, to utilize the second set of arithmetic units to perform inference on input data, to utilize the first set of arithmetic units to generate feedback for updating parameters of the second set of arithmetic units based on the inference performed on the input data by the second set of arithmetic units, to tune parameters of the second set of arithmetic units based at least in part on the feedback generated by the first set of arithmetic units, and to utilize the second set of arithmetic units with the tuned parameters to generate inference results.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Xiao Sun, Chia-Yu Chen, Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan
  • Publication number: 20210064954
    Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
  • Publication number: 20200401413
    Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).
    Type: Application
    Filed: June 20, 2019
    Publication date: December 24, 2020
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
  • Patent number: 10838868
    Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.
    Type: Grant
    Filed: March 7, 2019
    Date of Patent: November 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Brian Curran, Bruce Fleischer, Kailash Gopalakrishan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan, Swagath Venkataramani
  • Publication number: 20200356371
    Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.
    Type: Application
    Filed: May 8, 2019
    Publication date: November 12, 2020
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce FLEISCHER, Sunil SHUKLA, Vijayalakshmi SRINIVASAN, Jungwook CHOI
  • Publication number: 20200311536
    Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.
    Type: Application
    Filed: March 25, 2019
    Publication date: October 1, 2020
    Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
  • Publication number: 20200285579
    Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.
    Type: Application
    Filed: March 7, 2019
    Publication date: September 10, 2020
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu CHEN, Jungwook CHOI, Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHAN, Jinwook OH, Sunil K. SHUKLA, Vijayalakshmi SRINIVASAN, Swagath VENKATARAMANI
  • Patent number: 10769238
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: September 8, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20200249910
    Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.
    Type: Application
    Filed: February 6, 2019
    Publication date: August 6, 2020
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
  • Publication number: 20200226459
    Abstract: A processor receives input data and provides the input data to a first neural network including a first neural network model. The first neural network model has a first numerical precision level. A first feature vector is generated from the input data using the first neural network. The input data is provided to a second neural network including a second neural network model. The second neural network model has a second numerical precision level different from the first numerical precession level. A second feature vector is generated from the input data using the second neural network. A difference metric is computed between the first feature vector and the second feature vector. The difference metric is indicative of whether the input data includes adversarial data.
    Type: Application
    Filed: January 11, 2019
    Publication date: July 16, 2020
    Applicant: International Business Machines Corporation
    Inventors: Chia-Yu Chen, Pin-Yu Chen, Pierce I-Jen Chuang, Richard Chen, Jungwook Choi, Kailash Gopalakrishnan
  • Patent number: 10657442
    Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: May 19, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
  • Publication number: 20200134105
    Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
  • Patent number: 10592208
    Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: March 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Naigang Wang, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
  • Patent number: 10565285
    Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.
    Type: Grant
    Filed: December 18, 2017
    Date of Patent: February 18, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
  • Publication number: 20200012706
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: September 19, 2019
    Publication date: January 9, 2020
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20200005125
    Abstract: A compensated deep neural network (compensated-DNN) is provided. A first vector having a set of components and a second vector having a set of corresponding components are received. A component of the first vector includes a first quantized value and a first compensation instruction, and a corresponding component of the second vector includes a second quantized value and a second compensation instruction. The first quantized value is multiplied with the second quantized value to compute a raw product value. The raw product value is compensated for a quantization error according to the first and second compensation instructions to produce a compensated product value. The compensated product value is added into an accumulated value for the dot product. The accumulated value is converted into an output vector of the dot product. The output vector includes an output quantized value and an output compensation instruction.
    Type: Application
    Filed: June 27, 2018
    Publication date: January 2, 2020
    Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Jungwook Choi, Leland Chang
  • Publication number: 20190385050
    Abstract: Techniques for statistics-aware weight quantization are presented. To facilitate reducing the bit precision of weights, for a set of weights, a quantizer management component can estimate a quantization scale value to apply to a weight as a linear or non-linear function of the mean of a square of a weight value of the weight and the mean of an absolute value of the weight value, wherein the quantization scale value is determined to have a smaller quantization error than all, or at least almost all, other quantization errors associated with other quantization scale values. A quantizer component applies the quantization scale value to symmetrically and/or uniformly quantize weights of a layer of the set of weights to generate quantized weights, the weights being quantized using rounding. The respective quantized weights can be used to facilitate training and inference of a deep learning system.
    Type: Application
    Filed: June 13, 2018
    Publication date: December 19, 2019
    Inventors: Zhuo Wang, Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang