Patents by Inventor Xiaochen PENG
Xiaochen PENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260086770Abstract: In one or more aspects, a processing device for numerical data quantization includes processing circuitry configured to determine a maximum exponent from a set of exponents of a set of digital representations of a set of numbers, obtain a set of scaled exponents based on the maximum exponent, and perform one of: (i) obtain a set of quantized significands based on a set of mantissas of the set of digital representations and the set of scaled exponents, or (ii) obtain a set of quantized mantissas based on the set of mantissas. The processing circuitry is configured to output a set of quantized digital representations of the set of numbers, based on the set of quantized significands, or based on the set of quantized mantissas and the set of scaled exponents; and to output a biased exponent scaling factor based on the maximum exponent.Type: ApplicationFiled: October 25, 2024Publication date: March 26, 2026Applicant: Taiwan Semiconductor Manufacturing Company, Ltd.Inventors: Xiaochen PENG, Brian CRAFTON, Murat Kerem AKARVARDAR, Ashwin Sanjay LELE, Bo ZHANG, Win-San KHWA
-
Patent number: 12580011Abstract: A memory circuit includes a compute in-memory (CIM) array. The CIM array includes a memory cell array configured to store a first set of data. The first set of data including a first set of weights or a second set of data. The first set of data being exponent portions of corresponding floating point numbers. The second set of data being a compressed version of the first set of weights. The first set of weights having a first data length, and the second set of data having a second data length less than the first data length. The CIM array further includes a decoder coupled to the memory cell array, and being configured to generate a first set of output signals in response to a first set of input signals, the first set of data and a flag signal.Type: GrantFiled: June 11, 2024Date of Patent: March 17, 2026Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.Inventors: Brian Crafton, Xiaochen Peng, Murat Kerem Akarvardar
-
Publication number: 20250378860Abstract: A memory circuit includes a compute in-memory (CIM) array. The CIM array includes a memory cell array configured to store a first set of data. The first set of data including a first set of weights or a second set of data. The first set of data being exponent portions of corresponding floating point numbers. The second set of data being a compressed version of the first set of weights. The first set of weights having a first data length, and the second set of data having a second data length less than the first data length. The CIM array further includes a decoder coupled to the memory cell array, and being configured to generate a first set of output signals in response to a first set of input signals, the first set of data and a flag signal.Type: ApplicationFiled: June 11, 2024Publication date: December 11, 2025Inventors: Brian CRAFTON, Xiaochen PENG, Murat Kerem AKARVARDAR
-
Publication number: 20250370714Abstract: Systems, devices, circuits, and methods of operating said systems, devices, and circuits are disclosed. In one aspect, a system includes an input buffer circuit storing a set of data values for a convolution operation and a plurality of multiply-accumulate (MAC) circuits. A first MAC circuit of the plurality of MAC circuits can retrieve the set of data values for the convolution operation and generate a first output by applying a first weight value stored at the first MAC circuit to a first data value of the set of data values. The first MAC circuit can provide the first data value to a second MAC circuit of the plurality of MAC circuits. The first MAC circuit can generate a plurality of second outputs by applying a second weight value and a third weight value stored at the first MAC circuit to a second data value of the set of data values.Type: ApplicationFiled: May 28, 2024Publication date: December 4, 2025Applicant: Taiwan Semiconductor Manufacturing Company, Ltd.Inventors: Xiaochen Peng, Murat Kerem Akarvardar
-
Publication number: 20250328313Abstract: In some embodiments, computing a sum of floating-point numbers, such as in multiply-accumulate operations, includes aligning the mantissas of the floating point number by adjusting at least a subset of the mantissas so that the exponents of the floating-point numbers are the same. After the alignment, the most significant portion of each mantissa is rounded depending on the remainder of the mantissa, for example the most significant bit of the remainder. The mantissas are then truncated to the rounded most significant portions. The truncated mantissas are then summed. The mantissas being aligned can be products of mantissas of respective inputs and weights. The sum of the rounded portions in such cases are a result of multiply-accumulate operations, with a reduced bit width.Type: ApplicationFiled: April 19, 2024Publication date: October 23, 2025Inventors: Xiaochen PENG, Brian Crafton, Murat Kerem Akarvardar
-
Publication number: 20250258710Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.Type: ApplicationFiled: April 4, 2025Publication date: August 14, 2025Inventors: Xiaoyu SUN, Xiaochen PENG, Murat Kerem AKARVARDAR
-
Publication number: 20250224923Abstract: In some embodiments, a computing method includes, for pairs of a first and second floating-point numbers, each having a respective mantissa and exponent, supplying to a respective one of multiply circuits the mantissas of a subset of the pairs of first and second floating-point number, the subset of the plurality of pairs of first and second floating-point numbers each having a respective sum of the exponents of the first and second floating-point numbers, respectively, meeting a predetermined criterion, such as the sum being smaller than a predetermined threshold value; generating, using each of the plurality of multiply circuits, a product of the mantissas of the respective pair of first and second floating-point numbers; accumulating the product mantissas to generate a product mantissa partial sum; combining the product mantissa partial sum and maximum product exponent to generate an output floating point number; and for each of the remaining pairs of first and second floating-point numbers: withholding thType: ApplicationFiled: May 6, 2024Publication date: July 10, 2025Inventors: Xiaochen PENG, Brian Crafton, Murat Kerem Akarvardar, Hidehiro Fujiwara, Haruki Mori
-
Publication number: 20250224922Abstract: In some embodiments, a computing method includes, for a set of products, each of a respective pair of a first and a second floating-point operands, each having a respective mantissa and exponent, aligning the mantissas of the first operands based on a maximum exponent of the first operands to generate a shared exponent; modifying the mantissas of the first operands based on the shared exponent to generate respective adjusted mantissas of the first operands; generating mantissa products, each based on the mantissa of a respective one of the second operands and a respective one of the adjusted first mantissas retrieved from the memory device; summing the mantissas products to generate a mantissa product partial sum; and combining the shared exponent and the product mantissa partial sum. The adjusted mantissas of the first operands can be saved in, and retrieved from, a memory device for the mantissa product generation.Type: ApplicationFiled: April 24, 2024Publication date: July 10, 2025Inventors: Win-San KHWA, Hung-Hsi HSU, Xiaochen PENG, Murat Kerem Akarvardar, Meng-Fan Chang
-
Patent number: 12293229Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.Type: GrantFiled: August 31, 2022Date of Patent: May 6, 2025Assignee: Taiwan Semiconductor Manufacturing Company, Ltd.Inventors: Xiaoyu Sun, Xiaochen Peng, Murat Kerem Akarvardar
-
Publication number: 20250124956Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.Type: ApplicationFiled: December 19, 2024Publication date: April 17, 2025Inventors: MURAT KEREM AKARVARDAR, XIAOCHEN PENG
-
Patent number: 12205665Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.Type: GrantFiled: January 17, 2023Date of Patent: January 21, 2025Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LTD.Inventors: Murat Kerem Akarvardar, Xiaochen Peng
-
Publication number: 20240242071Abstract: The present disclosure provides an accelerator circuit, a semiconductor device, and a method for accelerating convolution in a convolutional neural network. The accelerator circuit includes a plurality of sub processing-element (PE) arrays, and each of the plurality of sub PE arrays includes a plurality of processing elements. The processing elements in each of the plurality of sub PE arrays implement a standard convolutional layer during a first configuration applied to the accelerator circuit, and implement a depth-wise convolutional layer during a second configuration applied to the accelerator circuit.Type: ApplicationFiled: January 18, 2023Publication date: July 18, 2024Inventors: XIAOCHEN PENG, MURAT KEREM AKARVARDAR, XIAOYU SUN
-
Publication number: 20240203463Abstract: A 3D memory device is provided. The 3D memory device includes a first logic base layer, a second layer, and a third layer. The first logic base layer comprises a first type DEMUX, a plurality of second type DEMUXs coupled to the first type DEMUX, a first type MUX, and a plurality of second type MUXs coupled to the first type MUX. The second layer comprises a first group of memory units. Each of the first group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs. The third layer comprises a second group of memory units. Each of the second group of memory units is respectively coupled to a corresponding DEMUX of the plurality of second type DEMUXs and a corresponding MUX of the plurality of second type MUXs.Type: ApplicationFiled: January 17, 2023Publication date: June 20, 2024Inventors: MURAT KEREM AKARVARDAR, XIAOCHEN PENG
-
Publication number: 20240069971Abstract: An artificial intelligence (AI) accelerator device may include a plurality of on-chip mini buffers that are associated with a processing element (PE) array. Each mini buffer is associated with a subset of rows or a subset of columns of the PE array. Partitioning an on-chip buffer of the AI accelerator device into the mini buffers described herein may reduce the size and complexity of the on-chip buffer. The reduced size of the on-chip buffer may reduce the wire routing complexity of the on-chip buffer, which may reduce latency and may reduce access energy for the AI accelerator device. This may increase the operating efficiency and/or may increase the performance of the AI accelerator device. Moreover, the mini buffers may increase the overall bandwidth that is available for the mini buffers to transfer data to and from the PE array.Type: ApplicationFiled: August 31, 2022Publication date: February 29, 2024Inventors: Xiaoyu SUN, Xiaochen PENG, Murat Kerem AKARVARDAR