Patents by Inventor Xiaochen PENG

Xiaochen PENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260086770
    Abstract: In one or more aspects, a processing device for numerical data quantization includes processing circuitry configured to determine a maximum exponent from a set of exponents of a set of digital representations of a set of numbers, obtain a set of scaled exponents based on the maximum exponent, and perform one of: (i) obtain a set of quantized significands based on a set of mantissas of the set of digital representations and the set of scaled exponents, or (ii) obtain a set of quantized mantissas based on the set of mantissas. The processing circuitry is configured to output a set of quantized digital representations of the set of numbers, based on the set of quantized significands, or based on the set of quantized mantissas and the set of scaled exponents; and to output a biased exponent scaling factor based on the maximum exponent.
    Type: Application
    Filed: October 25, 2024
    Publication date: March 26, 2026
    Applicant: Taiwan Semiconductor Manufacturing Company, Ltd.
    Inventors: Xiaochen PENG, Brian CRAFTON, Murat Kerem AKARVARDAR, Ashwin Sanjay LELE, Bo ZHANG, Win-San KHWA
  • Patent number: 12580011
    Abstract: A memory circuit includes a compute in-memory (CIM) array. The CIM array includes a memory cell array configured to store a first set of data. The first set of data including a first set of weights or a second set of data. The first set of data being exponent portions of corresponding floating point numbers. The second set of data being a compressed version of the first set of weights. The first set of weights having a first data length, and the second set of data having a second data length less than the first data length. The CIM array further includes a decoder coupled to the memory cell array, and being configured to generate a first set of output signals in response to a first set of input signals, the first set of data and a flag signal.
    Type: Grant
    Filed: June 11, 2024
    Date of Patent: March 17, 2026
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.
    Inventors: Brian Crafton, Xiaochen Peng, Murat Kerem Akarvardar
  • Publication number: 20250378860
    Abstract: A memory circuit includes a compute in-memory (CIM) array. The CIM array includes a memory cell array configured to store a first set of data. The first set of data including a first set of weights or a second set of data. The first set of data being exponent portions of corresponding floating point numbers. The second set of data being a compressed version of the first set of weights. The first set of weights having a first data length, and the second set of data having a second data length less than the first data length. The CIM array further includes a decoder coupled to the memory cell array, and being configured to generate a first set of output signals in response to a first set of input signals, the first set of data and a flag signal.
    Type: Application
    Filed: June 11, 2024
    Publication date: December 11, 2025
    Inventors: Brian CRAFTON, Xiaochen PENG, Murat Kerem AKARVARDAR
  • Publication number: 20250370714
    Abstract: Systems, devices, circuits, and methods of operating said systems, devices, and circuits are disclosed. In one aspect, a system includes an input buffer circuit storing a set of data values for a convolution operation and a plurality of multiply-accumulate (MAC) circuits. A first MAC circuit of the plurality of MAC circuits can retrieve the set of data values for the convolution operation and generate a first output by applying a first weight value stored at the first MAC circuit to a first data value of the set of data values. The first MAC circuit can provide the first data value to a second MAC circuit of the plurality of MAC circuits. The first MAC circuit can generate a plurality of second outputs by applying a second weight value and a third weight value stored at the first MAC circuit to a second data value of the set of data values.
    Type: Application
    Filed: May 28, 2024
    Publication date: December 4, 2025
    Applicant: Taiwan Semiconductor Manufacturing Company, Ltd.
    Inventors: Xiaochen Peng, Murat Kerem Akarvardar
  • Publication number: 20250328313
    Abstract: In some embodiments, computing a sum of floating-point numbers, such as in multiply-accumulate operations, includes aligning the mantissas of the floating point number by adjusting at least a subset of the mantissas so that the exponents of the floating-point numbers are the same. After the alignment, the most significant portion of each mantissa is rounded depending on the remainder of the mantissa, for example the most significant bit of the remainder. The mantissas are then truncated to the rounded most significant portions. The truncated mantissas are then summed. The mantissas being aligned can be products of mantissas of respective inputs and weights. The sum of the rounded portions in such cases are a result of multiply-accumulate operations, with a reduced bit width.
    Type: Application
    Filed: April 19, 2024
    Publication date: October 23, 2025
    Inventors: Xiaochen PENG, Brian Crafton, Murat Kerem Akarvardar
  • Publication number: 20250258710
    Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.
    Type: Application
    Filed: April 4, 2025
    Publication date: August 14, 2025
    Inventors: Xiaoyu SUN, Xiaochen PENG, Murat Kerem AKARVARDAR
  • Publication number: 20250224923
    Abstract: In some embodiments, a computing method includes, for pairs of a first and second floating-point numbers, each having a respective mantissa and exponent, supplying to a respective one of multiply circuits the mantissas of a subset of the pairs of first and second floating-point number, the subset of the plurality of pairs of first and second floating-point numbers each having a respective sum of the exponents of the first and second floating-point numbers, respectively, meeting a predetermined criterion, such as the sum being smaller than a predetermined threshold value; generating, using each of the plurality of multiply circuits, a product of the mantissas of the respective pair of first and second floating-point numbers; accumulating the product mantissas to generate a product mantissa partial sum; combining the product mantissa partial sum and maximum product exponent to generate an output floating point number; and for each of the remaining pairs of first and second floating-point numbers: withholding th
    Type: Application
    Filed: May 6, 2024
    Publication date: July 10, 2025
    Inventors: Xiaochen PENG, Brian Crafton, Murat Kerem Akarvardar, Hidehiro Fujiwara, Haruki Mori
  • Publication number: 20250224922
    Abstract: In some embodiments, a computing method includes, for a set of products, each of a respective pair of a first and a second floating-point operands, each having a respective mantissa and exponent, aligning the mantissas of the first operands based on a maximum exponent of the first operands to generate a shared exponent; modifying the mantissas of the first operands based on the shared exponent to generate respective adjusted mantissas of the first operands; generating mantissa products, each based on the mantissa of a respective one of the second operands and a respective one of the adjusted first mantissas retrieved from the memory device; summing the mantissas products to generate a mantissa product partial sum; and combining the shared exponent and the product mantissa partial sum. The adjusted mantissas of the first operands can be saved in, and retrieved from, a memory device for the mantissa product generation.
    Type: Application
    Filed: April 24, 2024
    Publication date: July 10, 2025
    Inventors: Win-San KHWA, Hung-Hsi HSU, Xiaochen PENG, Murat Kerem Akarvardar, Meng-Fan Chang
  • Patent number: 12293229
    Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.
    Type: Grant
    Filed: August 31, 2022
    Date of Patent: May 6, 2025
    Assignee: Taiwan Semiconductor Manufacturing Company, Ltd.
    Inventors: Xiaoyu Sun, Xiaochen Peng, Murat Kerem Akarvardar
  • Publication number: 20250124956
    Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.
    Type: Application
    Filed: December 19, 2024
    Publication date: April 17, 2025
    Inventors: MURAT KEREM AKARVARDAR, XIAOCHEN PENG
  • Patent number: 12205665
    Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.
    Type: Grant
    Filed: January 17, 2023
    Date of Patent: January 21, 2025
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LTD.
    Inventors: Murat Kerem Akarvardar, Xiaochen Peng
  • Publication number: 20240242071
    Abstract: The present disclosure provides an accelerator circuit, a semiconductor device, and a method for accelerating convolution in a convolutional neural network. The accelerator circuit includes a plurality of sub processing-element (PE) arrays, and each of the plurality of sub PE arrays includes a plurality of processing elements. The processing elements in each of the plurality of sub PE arrays implement a standard convolutional layer during a first configuration applied to the accelerator circuit, and implement a depth-wise convolutional layer during a second configuration applied to the accelerator circuit.
    Type: Application
    Filed: January 18, 2023
    Publication date: July 18, 2024
    Inventors: XIAOCHEN PENG, MURAT KEREM AKARVARDAR, XIAOYU SUN
  • Publication number: 20240203463
    Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.
    Type: Application
    Filed: January 17, 2023
    Publication date: June 20, 2024
    Inventors: MURAT KEREM AKARVARDAR, XIAOCHEN PENG
  • Publication number: 20240069971
    Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.
    Type: Application
    Filed: August 31, 2022
    Publication date: February 29, 2024
    Inventors: Xiaoyu SUN, Xiaochen PENG, Murat Kerem AKARVARDAR