Patents by Inventor Yajun Ha

Yajun Ha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LAYOUT METHOD AND APPLICATION OF SCALABLE MULTI-DIE NETWORK-ON-CHIP FPGA ARCHITECTURE

Publication number: 20240143883

Abstract: A layout method for a scalable multi-die network-on-chip FPGA architecture is provided. An application of the aforementioned layout method for the scalable multi-die network-on-chip FPGA architecture is further provided. A scalable multi-die FPGA architecture based on network-on-chip and a corresponding hierarchical recursive layout algorithm are provided, aiming to directly map a register transfer level dataflow design generated by existing high-level synthesis onto the provided interconnection architecture. The layout method can exploit the potential for hierarchical topology and make more efficient use of dedicated interconnection resources, such as cross-die nets, network-on-chips, and high-speed transceivers.

Type: Application

Filed: May 31, 2023

Publication date: May 2, 2024

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Jianwen LUO, Yajun HA
DUAL-SIX-TRANSISTOR (D6T) IN-MEMORY COMPUTING (IMC) ACCELERATOR SUPPORTING ALWAYS-LINEAR DISCHARGE AND REDUCING DIGITAL STEPS

Publication number: 20240135989

Abstract: A dual-six-transistor (D6T) in-memory computing (IMC) accelerator supporting always-linear discharge and reducing digital steps is provided. In the IMC accelerator, three effective techniques are proposed: (1) A D6T bitcell can reliably run at 0.4 V and enter a standby mode at 0.26 V, to support parallel processing of dual decoupled ports. (2) An always-linear discharge and convolution mechanism (ALDCM) not only reduces a voltage of a bit line (BL), but also keeps linear calculation throughout an entire voltage range of the BL. (3) A bypass of a bias voltage time converter (BVTC) reduces digital steps, but still keeps high energy efficiency and computing density at a low voltage. A measurement result of the IMC accelerator shows that the IMC accelerator achieves an average energy efficiency of 8918 TOPS/W (8b×8b), and an average computing density of 38.6 TOPS/mm2 (8b×8b) in a 55 nm CMOS technology.

Type: Application

Filed: October 8, 2023

Publication date: April 25, 2024

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Hongtu ZHANG, Yuhao SHU, Yajun HA
ENERGY-EFFICIENT POINT CLOUD FEATURE EXTRACTION METHOD BASED ON FIELD-PROGRAMMABLE GATE ARRAY (FPGA) AND APPLICATION THEREOF

Publication number: 20240127466

Abstract: An energy-efficient point cloud feature extraction method based on a field-programmable gate array (FPGA) is mapped onto the FPGA for running. The energy-efficient point cloud feature extraction method based on the FPGA is applied to point cloud feature extraction in unmanned driving; or an intelligent robot. Compared with an existing technical solution, the energy-efficient point cloud feature extraction method based on the FPGA has following innovative points: a low-complexity projection method for organizing unordered and sparse point clouds, a high-parallel method for extracting a coarse-grained feature point, and a high-parallel method for selecting a fine-grained feature point.

Type: Application

Filed: September 19, 2023

Publication date: April 18, 2024

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Hao SUN, Yajun HA
MAX-FLOW/MIN-CUT SOLUTION ALGORITHM FOR EARLY TERMINATING PUSH-RELABEL ALGORITHM

Publication number: 20240112443

Abstract: A max-flow/min-cut solution algorithm for early terminating a push-relabel algorithm is provided. The max-flow/min-cut solution algorithm is used for an application that does not require an exact maximum flow, and includes: defining an early termination condition of the push-relabel algorithm by a separation condition and a stable condition; determining that the separation condition is satisfied if there is no source node s, s?S, in the set T at any time in an operation process of the push-relabel algorithm; determining that the stable condition is satisfied if there is no active node in the set T; and terminating the push-relabel algorithm if both the separation condition and the stability condition are satisfied. The early termination technique is proposed to greatly reduce redundant computations and ensure that the algorithm terminates correctly in all cases.

Type: Application

Filed: September 22, 2021

Publication date: April 4, 2024

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Xinzhe LIU, Guangyao YAN, Yajun HA
Ripple push method for graph cut

Patent number: 11934459

Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.

Type: Grant

Filed: September 22, 2021

Date of Patent: March 19, 2024

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Guangyao Yan, Xinzhe Liu, Yajun Ha, Hui Wang
Pure integer quantization method for lightweight neural network (LNN)

Patent number: 11934954

Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t?[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.

Type: Grant

Filed: September 22, 2021

Date of Patent: March 19, 2024

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Weixiong Jiang, Yajun Ha
Adaptive stereo matching optimization method and apparatus, device and storage medium

Patent number: 11875523

Abstract: The present disclosure provides an adaptive stereo matching optimization method, apparatus, and device, and a storage medium. The method includes: acquiring images of at least two perspectives of the same target scene, accordingly obtaining, through calculation, disparity value ranges corresponding to pixels in the target scene; and obtaining optimized depth value ranges by adjusting the disparity value ranges of the pixels in the target scene in real time through an adaptive stereo matching model; adjusting an execution cycle in the adaptive stereo matching model in real time through a DVFS algorithm according to a resource constraint condition of the processing system; and/or training on a plurality of scene image data sets through a convolutional neural network, so that the specific function parameters in the adaptive stereo matching model are correspondingly adjusted in real time according to the acquired different scene images.

Type: Grant

Filed: September 20, 2019

Date of Patent: January 16, 2024

Assignee: ShanghaiTech University

Inventors: Fupeng Chen, Heng Yu, Yajun Ha
Enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator

Patent number: 11875244

Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.

Type: Grant

Filed: August 5, 2022

Date of Patent: January 16, 2024

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Hongtu Zhang, Yuhao Shu, Yajun Ha
Normal distributions transform (NDT) method for LiDAR point cloud localization in unmanned driving

Patent number: 11845466

Abstract: A normal distributions transform (NDT) method for LiDAR point cloud localization in unmanned driving is provided. The method proposes a non-recursive, memory-efficient data structure occupation-aware-voxel-structure (OAVS), which speeds up each search operation. Compared with a tree-based structure, the proposed data structure OAVS is easy to parallelize and consumes only about 1/10 of memory. Based on the data structure OAVS, the method proposes a semantic-assisted OAVS-based (SEO)-NDT algorithm, which significantly reduces the number of search operations, redefines a parameter affecting the number of search operations, and removes a redundant search operation. In addition, the method proposes a streaming field-programmable gate array (FPGA) accelerator architecture, which further improves the real-time and energy-saving performance of the SEO-NDT algorithm. The method meets the real-time and high-precision requirements of smart vehicles for three-dimensional (3D) lidar localization.

Type: Grant

Filed: September 22, 2021

Date of Patent: December 19, 2023

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Qi Deng, Hao Sun, Yajun Ha, Hui Wang
Full-path circuit delay measurement device for field-programmable gate array (FPGA) and measurement method

Patent number: 11762015

Abstract: A full-path circuit delay measurement device for a field-programmable gate array (FPGA) and a measurement method are provided. The measurement device includes two shadow registers and a phase-shifted clock, where the two shadow registers take an output of a measured combinational logic circuit as a clock and sample the phase-shifted clock SCLK as data; the two shadow registers are respectively triggered on rising and falling edges of the output of the measured combinational logic circuit to sample the phase-shifted clock; outputs of the two shadow registers are delivered by an OR gate as an input into a synchronization register; a clock of the synchronization register serves as a clock MCLK of the measured combinational logic circuit; an output of the synchronization register serves as that of the circuit delay measurement device; the phase-shifted clock SCLK and the clock MCLK of the measured combinational logic circuit have the same frequency.

Type: Grant

Filed: September 22, 2021

Date of Patent: September 19, 2023

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Weixiong Jiang, Yajun Ha
High-energy-efficiency binary neural network accelerator applicable to artificial intelligence internet of things

Patent number: 11762700

Abstract: A high-energy-efficiency binary neural network accelerator applicable to artificial intelligence Internet of Things is provided. 0.3-0.6V sub/near threshold 10T1C multiplication bit units with series capacitors are configured for charge domain binary convolution. An anti-process deviation differential voltage amplification array between bit lines and DACs is configured for robust pre-amplification in 0.3V batch standardized operations. A lazy bit line reset scheme further reduces energy, and inference accuracy losses can be ignored. Therefore, a binary neural network accelerator chip based on in-memory computation achieves peak energy efficiency of 18.5 POPS/W and 6.06 POPS/W, which are respectively improved by 21× and 135× compared with previous macro and system work [9, 11].

Type: Grant

Filed: January 19, 2023

Date of Patent: September 19, 2023

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Hongtu Zhang, Yuhao Shu, Yajun Ha
FULL-PATH CIRCUIT DELAY MEASUREMENT DEVICE FOR FIELD-PROGRAMMABLE GATE ARRAY (FPGA) AND MEASUREMENT METHOD

Publication number: 20230194602

Abstract: A full-path circuit delay measurement device for a field-programmable gate array (FPGA) and a measurement method are provided. The measurement device includes two shadow registers and a phase-shifted clock, where the two shadow registers take an output of a measured combinational logic circuit as a clock and sample the phase-shifted clock SCLK as data; the two shadow registers are respectively triggered on rising and falling edges of the output of the measured combinational logic circuit to sample the phase-shifted clock; outputs of the two shadow registers are delivered by an OR gate as an input into a synchronization register; a clock of the synchronization register serves as a clock MCLK of the measured combinational logic circuit; an output of the synchronization register serves as that of the circuit delay measurement device; the phase-shifted clock SCLK and the clock MCLK of the measured combinational logic circuit have the same frequency.

Type: Application

Filed: September 22, 2021

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Weixiong JIANG, Yajun HA
PURE INTEGER QUANTIZATION METHOD FOR LIGHTWEIGHT NEURAL NETWORK (LNN)

Publication number: 20230196095

Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t?[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.

Type: Application

Filed: September 22, 2021

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Weixiong JIANG, Yajun HA
NORMAL DISTRIBUTIONS TRANSFORM (NDT) METHOD FOR LIDAR POINT CLOUD LOCALIZATION IN UNMANNED DRIVING

Publication number: 20230192123

Abstract: A normal distributions transform (NDT) method for LiDAR point cloud localization in unmanned driving is provided. The method proposes a non-recursive, memory-efficient data structure occupation-aware-voxel-structure (OAVS), which speeds up each search operation. Compared with a tree-based structure, the proposed data structure OAVS is easy to parallelize and consumes only about 1/10 of memory. Based on the data structure OAVS, the method proposes a semantic-assisted OAVS-based (SEO)-NDT algorithm, which significantly reduces the number of search operations, redefines a parameter affecting the number of search operations, and removes a redundant search operation. In addition, the method proposes a streaming field-programmable gate array (FPGA) accelerator architecture, which further improves the real-time and energy-saving performance of the SEO-NDT algorithm. The method meets the real-time and high-precision requirements of smart vehicles for three-dimensional (3D) lidar localization.

Type: Application

Filed: September 22, 2021

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Qi DENG, Hao SUN, Yajun HA, Hui WANG
ENHANCED DYNAMIC RANDOM ACCESS MEMORY (EDRAM)-BASED COMPUTING-IN-MEMORY (CIM) CONVOLUTIONAL NEURAL NETWORK (CNN) ACCELERATOR

Publication number: 20230196079

Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.

Type: Application

Filed: August 5, 2022

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Hongtu ZHANG, Yuhao SHU, Yajun HA
RIPPLE PUSH METHOD FOR GRAPH CUT

Publication number: 20230195793

Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.

Type: Application

Filed: September 22, 2021

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Guangyao YAN, Xinzhe LIU, Yajun HA, Hui WANG
STATIC RANDOM-ACCESS MEMORY (SRAM) CELL FOR HIGH-SPEED CONTENT-ADDRESSABLE MEMORY AND IN-MEMORY BOOLEAN LOGIC OPERATION

Publication number: 20230197154

Abstract: A static random-access memory (SRAM) cell for high-speed content-addressable memory (CAM) and in-memory Boolean logic operations includes a standard 6T-SRAM and two additional PMOS access transistors, where read word lines of the two positive-channel metal oxide semiconductor (PMOS) access transistors P1 and P2 are RWLR and RWLL respectively, and under the control thereof, a differential read port RBL/RBL is formed. The SRAM cell is suitable for multi-row address selection, and typically applied to in-memory high-speed CAM and in-memory Boolean logic operations. Due to PMOS device characteristics, the structure design of the SRAM cell can avoid read disturbance generated by an in-memory SRAM, and ensure that the SRAM can perform in-memory CAM and in-memory Boolean logic operations stably at a high speed. In addition, this SRAM-based IMC solution supports commercial CMOS technology, and has an opportunity to leverage a large number of existing on-chip SRAM caches.

Type: Application

Filed: September 22, 2021

Publication date: June 22, 2023

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Jian CHEN, Yajun HA
Method for Disseminating Scaling Information and Application Thereof in VLSI Implementation of Fixed-Point FFT

Publication number: 20230179315

Abstract: Example embodiments relate to methods for disseminating scaling information and applications thereof in very large scale integration (VLSI) implementations of fixed-point fast Fourier transforms (FFTs). One embodiment includes a method for disseminating scaling information in a system. The system includes a linear decomposable transformation process and an inverse process of the linear decomposable transformation process. The inverse process of the linear decomposable transformation process is defined, in time or space, as an inverse linear decomposable transformation process. The linear decomposable transformation process is separated from the inverse linear decomposable transformation process. The linear decomposable transformation process or the inverse linear decomposable transformation process is able to be performed first and is defined as a linear decomposable transformation I. The other remaining process is performed subsequently and is defined as a linear decomposable transformation II.

Type: Application

Filed: October 26, 2022

Publication date: June 8, 2023

Inventors: Xinzhe Liu, Raees Kizhakkumkara Muhamad, Dessislava Nikolova, Yajun Ha, Francky Catthoor, Fupeng Chen, Peter Schelkens, David Blinder
Optimized reconfiguration algorithm based on dynamic voltage and frequency scaling

Patent number: 11537774

Abstract: An optimized reconfiguration algorithm based on dynamic voltage and frequency scaling (DVFS) is provided, which mainly has the following contributions. The optimized reconfiguration algorithm based on DVFS proposes a DVFS-based reconfiguration method, which schedules user tasks according to a degree of parallelism (DOP) of the user tasks so as to reconfigure more parallel user tasks, thereby achieving higher reliability. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based heuristic approximation algorithm, which minimizes the delay of the DVFS-based reconfiguration scheduling algorithm. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based method, which reduces memory overhead caused by DVFS-based reconfiguration scheduling. The optimized reconfiguration algorithm based on DVFS improves the reliability of a field programmable gate array (FPGA) system and minimizes the area overhead of a hardware circuit.

Type: Grant

Filed: June 9, 2021

Date of Patent: December 27, 2022

Assignee: SHANGHAITECH UNIVERSITY

Inventors: Rui Li, Yajun Ha
OPTIMIZED RECONFIGURATION ALGORITHM BASED ON DYNAMIC VOLTAGE AND FREQUENCY SCALING

Publication number: 20220309217

Abstract: An optimized reconfiguration algorithm based on dynamic voltage and frequency scaling (DVFS) is provided, which mainly has the following contributions. The optimized reconfiguration algorithm based on DVFS proposes a DVFS-based reconfiguration method, which schedules user tasks according to a degree of parallelism (DOP) of the user tasks so as to reconfigure more parallel user tasks, thereby achieving higher reliability. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based heuristic approximation algorithm, which minimizes the delay of the DVFS-based reconfiguration scheduling algorithm. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based method, which reduces memory overhead caused by DVFS-based reconfiguration scheduling. The optimized reconfiguration algorithm based on DVFS improves the reliability of a field programmable gate array (FPGA) system and minimizes the area overhead of a hardware circuit.

Type: Application

Filed: June 9, 2021

Publication date: September 29, 2022

Applicant: SHANGHAITECH UNIVERSITY

Inventors: Rui LI, Yajun HA

1 2 next