Patents by Inventor Yajun Ha

Yajun Ha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230196079
    Abstract: An enhanced dynamic random access memory (eDRAM)-based computing-in-memory (CIM) convolutional neural network (CNN) accelerator comprises four P2ARAM blocks, where each of the P2ARAM blocks includes a 5T1C ping-pong eDRAM bit cell array composed of 64×16 5T1C ping-pong eDRAM bit cells. In each of the P2ARAM blocks, 64×2 digital time converters convert a 4-bit activation value into different pulse widths from a row direction and input the pulse widths into the 5T1C ping-pong eDRAM bit cell array for calculation. A total of 16×2 convolution results are output in a column direction of the 5T1C ping-pong eDRAM bit cell array. The CNN accelerator uses the 5T1C ping-pong eDRAM bit cells to perform multi-bit storage and convolution in parallel. An S2M-ADC scheme is proposed to allot an area of an input sampling capacitor of an ABL to sign-numerical SAR ADC units of a C-DAC array without adding area overhead.
    Type: Application
    Filed: August 5, 2022
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Hongtu ZHANG, Yuhao SHU, Yajun HA
  • Publication number: 20230194602
    Abstract: A full-path circuit delay measurement device for a field-programmable gate array (FPGA) and a measurement method are provided. The measurement device includes two shadow registers and a phase-shifted clock, where the two shadow registers take an output of a measured combinational logic circuit as a clock and sample the phase-shifted clock SCLK as data; the two shadow registers are respectively triggered on rising and falling edges of the output of the measured combinational logic circuit to sample the phase-shifted clock; outputs of the two shadow registers are delivered by an OR gate as an input into a synchronization register; a clock of the synchronization register serves as a clock MCLK of the measured combinational logic circuit; an output of the synchronization register serves as that of the circuit delay measurement device; the phase-shifted clock SCLK and the clock MCLK of the measured combinational logic circuit have the same frequency.
    Type: Application
    Filed: September 22, 2021
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Weixiong JIANG, Yajun HA
  • Publication number: 20230195793
    Abstract: A ripple push method for a graph cut includes: obtaining an excess flow ef(v) of a current node v; traversing four edges connecting the current node v in top, bottom, left and right directions, and determining whether each of the four edges is a pushable edge; calculating, according to different weight functions, a maximum push value of each of the four edges by efw=ef(v)*W, where W denotes a weight function; and traversing the four edges, recording a pushable flow of each of the four edges, and pushing out a calculated flow. The ripple push method explores different push weight functions, and significantly improves the actual parallelism of the push-relabel algorithm.
    Type: Application
    Filed: September 22, 2021
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Guangyao YAN, Xinzhe LIU, Yajun HA, Hui WANG
  • Publication number: 20230192123
    Abstract: A normal distributions transform (NDT) method for LiDAR point cloud localization in unmanned driving is provided. The method proposes a non-recursive, memory-efficient data structure occupation-aware-voxel-structure (OAVS), which speeds up each search operation. Compared with a tree-based structure, the proposed data structure OAVS is easy to parallelize and consumes only about 1/10 of memory. Based on the data structure OAVS, the method proposes a semantic-assisted OAVS-based (SEO)-NDT algorithm, which significantly reduces the number of search operations, redefines a parameter affecting the number of search operations, and removes a redundant search operation. In addition, the method proposes a streaming field-programmable gate array (FPGA) accelerator architecture, which further improves the real-time and energy-saving performance of the SEO-NDT algorithm. The method meets the real-time and high-precision requirements of smart vehicles for three-dimensional (3D) lidar localization.
    Type: Application
    Filed: September 22, 2021
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Qi DENG, Hao SUN, Yajun HA, Hui WANG
  • Publication number: 20230197154
    Abstract: A static random-access memory (SRAM) cell for high-speed content-addressable memory (CAM) and in-memory Boolean logic operations includes a standard 6T-SRAM and two additional PMOS access transistors, where read word lines of the two positive-channel metal oxide semiconductor (PMOS) access transistors P1 and P2 are RWLR and RWLL respectively, and under the control thereof, a differential read port RBL/RBL is formed. The SRAM cell is suitable for multi-row address selection, and typically applied to in-memory high-speed CAM and in-memory Boolean logic operations. Due to PMOS device characteristics, the structure design of the SRAM cell can avoid read disturbance generated by an in-memory SRAM, and ensure that the SRAM can perform in-memory CAM and in-memory Boolean logic operations stably at a high speed. In addition, this SRAM-based IMC solution supports commercial CMOS technology, and has an opportunity to leverage a large number of existing on-chip SRAM caches.
    Type: Application
    Filed: September 22, 2021
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Jian CHEN, Yajun HA
  • Publication number: 20230196095
    Abstract: A pure integer quantization method for a lightweight neural network (LNN) is provided. The method includes the following steps: acquiring a maximum value of each pixel in each of the channels of the feature map of a current layer; dividing a value of each pixel in each of the channels of the feature map by a t-th power of the maximum value, t?[0,1]; multiplying a weight in each of the channels by the maximum value of each pixel in each of the channels of the corresponding feature map; and convolving the processed feature map with the processed weight to acquire the feature map of a next layer. The algorithm is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.
    Type: Application
    Filed: September 22, 2021
    Publication date: June 22, 2023
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Weixiong JIANG, Yajun HA
  • Publication number: 20230179315
    Abstract: Example embodiments relate to methods for disseminating scaling information and applications thereof in very large scale integration (VLSI) implementations of fixed-point fast Fourier transforms (FFTs). One embodiment includes a method for disseminating scaling information in a system. The system includes a linear decomposable transformation process and an inverse process of the linear decomposable transformation process. The inverse process of the linear decomposable transformation process is defined, in time or space, as an inverse linear decomposable transformation process. The linear decomposable transformation process is separated from the inverse linear decomposable transformation process. The linear decomposable transformation process or the inverse linear decomposable transformation process is able to be performed first and is defined as a linear decomposable transformation I. The other remaining process is performed subsequently and is defined as a linear decomposable transformation II.
    Type: Application
    Filed: October 26, 2022
    Publication date: June 8, 2023
    Inventors: Xinzhe Liu, Raees Kizhakkumkara Muhamad, Dessislava Nikolova, Yajun Ha, Francky Catthoor, Fupeng Chen, Peter Schelkens, David Blinder
  • Patent number: 11537774
    Abstract: An optimized reconfiguration algorithm based on dynamic voltage and frequency scaling (DVFS) is provided, which mainly has the following contributions. The optimized reconfiguration algorithm based on DVFS proposes a DVFS-based reconfiguration method, which schedules user tasks according to a degree of parallelism (DOP) of the user tasks so as to reconfigure more parallel user tasks, thereby achieving higher reliability. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based heuristic approximation algorithm, which minimizes the delay of the DVFS-based reconfiguration scheduling algorithm. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based method, which reduces memory overhead caused by DVFS-based reconfiguration scheduling. The optimized reconfiguration algorithm based on DVFS improves the reliability of a field programmable gate array (FPGA) system and minimizes the area overhead of a hardware circuit.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: December 27, 2022
    Assignee: SHANGHAITECH UNIVERSITY
    Inventors: Rui Li, Yajun Ha
  • Publication number: 20220309217
    Abstract: An optimized reconfiguration algorithm based on dynamic voltage and frequency scaling (DVFS) is provided, which mainly has the following contributions. The optimized reconfiguration algorithm based on DVFS proposes a DVFS-based reconfiguration method, which schedules user tasks according to a degree of parallelism (DOP) of the user tasks so as to reconfigure more parallel user tasks, thereby achieving higher reliability. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based heuristic approximation algorithm, which minimizes the delay of the DVFS-based reconfiguration scheduling algorithm. The optimized reconfiguration algorithm based on DVFS proposes a K-means-based method, which reduces memory overhead caused by DVFS-based reconfiguration scheduling. The optimized reconfiguration algorithm based on DVFS improves the reliability of a field programmable gate array (FPGA) system and minimizes the area overhead of a hardware circuit.
    Type: Application
    Filed: June 9, 2021
    Publication date: September 29, 2022
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Rui LI, Yajun HA
  • Patent number: 11430200
    Abstract: An efficient K-nearest neighbor search algorithm for three-dimensional (3D) lidar point cloud in unmanned driving and a use of the foregoing K-nearest neighbor search algorithm in a point cloud map matching process in the unmanned driving are provided. A novel data structure for fast K-nearest neighbor search is used, such that each voxel or sub-voxel includes a proper quantity of points to reduce redundant search. The novel K-nearest neighbor search algorithm is based on a double segmentation voxel structure (DSVS) and a field programmable gate array (FPGA). By means of the novel K-nearest neighbor search algorithm, nearest neighbors are searched for only in a neighboring expected area near a search point, thereby reducing search of redundant points. In addition, an optimized data transmission and access policy is used, which makes the algorithm more fit the characteristic of the FPGA.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: August 30, 2022
    Assignee: SHANGHAITECH UNIVERSITY
    Inventors: Hao Sun, Yajun Ha
  • Publication number: 20220148281
    Abstract: An efficient K-nearest neighbor search algorithm for three-dimensional (3D) lidar point cloud in unmanned driving and a use of the foregoing K-nearest neighbor search algorithm in a point cloud map matching process in the unmanned driving are provided. A novel data structure for fast K-nearest neighbor search is used, such that each voxel or sub-voxel includes a proper quantity of points to reduce redundant search. The novel K-nearest neighbor search algorithm is based on a double segmentation voxel structure (DSVS) and a field programmable gate array (FPGA). By means of the novel K-nearest neighbor search algorithm, nearest neighbors are searched for only in a neighboring expected area near a search point, thereby reducing search of redundant points. In addition, an optimized data transmission and access policy is used, which makes the algorithm more fit the characteristic of the FPGA.
    Type: Application
    Filed: June 9, 2021
    Publication date: May 12, 2022
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Hao SUN, Yajun HA
  • Publication number: 20210390725
    Abstract: The present disclosure provides an adaptive stereo matching optimization method, apparatus, and device, and a storage medium. The method includes: acquiring images of at least two perspectives of the same target scene, accordingly obtaining, through calculation, disparity value ranges corresponding to pixels in the target scene; and obtaining optimized depth value ranges by adjusting the disparity value ranges of the pixels in the target scene in real time through an adaptive stereo matching model; adjusting an execution cycle in the adaptive stereo matching model in real time through a DVFS algorithm according to a resource constraint condition of the processing system; and/or training on a plurality of scene image data sets through a convolutional neural network, so that the specific function parameters in the adaptive stereo matching model are correspondingly adjusted in real time according to the acquired different scene images.
    Type: Application
    Filed: September 20, 2019
    Publication date: December 16, 2021
    Applicant: ShanghaiTech University
    Inventors: Fupeng CHEN, Heng YU, Yajun HA
  • Patent number: 11100979
    Abstract: A low-power SRAM memory cell includes five word lines and four bit lines. The five word lines are a first word line, a second word line, a third word line, a fourth word line and a fifth word line. The four bit lines are a first bit line, a second bit line, a third bit line, and a fourth bit line. During the operation process of calculating a binary 10×11, the first word line is 1, the second word line is 0, the third word line is 0, the fourth word line is 1, the high bit stored in the bit cell is 1, and the low bit is 1. The voltage value of the fifth word line is 0.73 volt. At this time, the first bit line, the second bit line, and the third bit line do not discharge, while the fourth bit line discharges.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: August 24, 2021
    Assignee: SHANGHAITECH UNIVERSITY
    Inventors: Yuqi Wang, Yajun Ha
  • Patent number: 11094071
    Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: August 17, 2021
    Assignee: SHANGHAITECH UNIVERSITY
    Inventors: Xinzhe Liu, Fupeng Chen, Yajun Ha
  • Publication number: 20210249069
    Abstract: A low-power SRAM memory cell includes five word lines and four bit lines. The five word lines are a first word line, a second word line, a third word line, a fourth word line and a fifth word line. The four bit lines are a first bit line, a second bit line, a third bit line, and a fourth bit line. During the operation process of calculating a binary 10×11, the first word line is 1, the second word line is 0, the third word line is 0, the fourth word line is 1, the high bit stored in the bit cell is 1, and the low bit is 1. The voltage value of the fifth word line is 0.73 volt. At this time, the first bit line, the second bit line, and the third bit line do not discharge, while the fourth bit line discharges.
    Type: Application
    Filed: June 17, 2020
    Publication date: August 12, 2021
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Yuqi WANG, Yajun HA
  • Publication number: 20210248764
    Abstract: An efficient parallel computing method for a box filter, includes: step 1, with respect to a given degree of parallelism N and a radius r of the filter kernel, establishing a first architecture provided without an extra register and a second architecture provided with the extra register; step 2, building a first adder tree for the first architecture and a second adder tree for the second architecture, respectively; step 3, searching the first adder tree and the second adder tree from top to bottom, calculating the pixel average corresponding to each filter kernel by using the first adder tree and the second adder tree, respectively, and counting resources required to be consumed by the first architecture and the second architecture, respectively; and, step 4, selecting one architecture consuming a relatively small resources from the first architecture and the second architecture for computing the box filter.
    Type: Application
    Filed: June 17, 2020
    Publication date: August 12, 2021
    Applicant: SHANGHAITECH UNIVERSITY
    Inventors: Xinzhe LIU, Fupeng CHEN, Yajun HA
  • Patent number: 7150011
    Abstract: The invention relates to methods and apparatus suitable for executing a service or application at a client peer or client side, having a client specific device or client specific platform, with a reconfigurable architecture, said service or application being provided from a service peer or a service side. In a first aspect of the invention, the method comprises transmitting to the client peer from the server peer an abstract bytecode. The abstract bytecode is generated at the service peer by performing a compilation of an application. The abstract bytecode includes hardware bytecode and software bytecode. At the client peer, the abstract bytecode is transformed into native bytecode for the client specific device.
    Type: Grant
    Filed: June 20, 2001
    Date of Patent: December 12, 2006
    Assignee: Interuniversitair Microelektronica Centrum (IMEC)
    Inventors: Yajun Ha, Patrick Schaumont, Serge Vernalde, Marc Engels
  • Publication number: 20020059456
    Abstract: The invention relates to methods and apparatus suitable for executing a service or application at a client peer or client side, having a client specific device or client specific platform, with a reconfigurable architecture, said service or application being provided from a service peer or a service side. In a first aspect of the invention, the method comprises transmitting to the client peer from the server peer an abstract bytecode. The abstract bytecode is generated at the service peer by performing a compilation of an application. The abstract bytecode includes hardware bytecode and software bytecode. At the client peer, the abstract bytecode is transformed into native bytecode for the client specific device.
    Type: Application
    Filed: June 20, 2001
    Publication date: May 16, 2002
    Inventors: Yajun Ha, Patrick Schaumont, Serge Vernalde, Marc Engels