Patents by Inventor QIANLI DI

QIANLI DI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10713172
    Abstract: A cache memory for a processor including an arbiter, a tag array and a request queue. The arbiter arbitrates among multiple memory access requests and provides a selected memory access request. The tag array has a first read port receiving the selected memory access request and has a second read port receiving a prefetch request from a prefetcher. The tag array makes a hit or miss determination of whether data requested by the selected memory access request or the prefetch request is stored in a corresponding data array. The request queue has a first write port for receiving the selected memory access request when it misses in the tag array, and has a second write port for receiving the prefetch request when it misses in the tag array. The additional read and write ports provide a separate and independent pipeline path for handing prefetch requests.
    Type: Grant
    Filed: November 13, 2017
    Date of Patent: July 14, 2020
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Qianli Di, Weili Li
  • Patent number: 10437599
    Abstract: A processor that reduces pipeline stall including a front end, a load queue, a scheduler, and a load buffer. The front end issues instructions while a first full indication is not provided, but otherwise stalls issuing instructions. The load queue stores issued load instruction entries including information needed to execute the issued load instruction. The load queue provides a second full indication when full. The scheduler dispatches issued instructions for execution except for stalled load instructions, such as when not yet been stored in the load queue. The load buffer transfers issued load instructions to the load queue when the load queue is not full. When the load queue is full, the load buffer temporarily buffers issued load instructions until the load queue is no longer full. The load buffer allows more accurate load queue full determination, and allows processing to continue even when the load queue is full.
    Type: Grant
    Filed: November 13, 2017
    Date of Patent: October 8, 2019
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventor: Qianli Di
  • Patent number: 10310859
    Abstract: A system and method of performing speculative parallel execution of a cache line unaligned load instruction including speculatively predicting whether a load instruction is unaligned with a cache memory, marking the load instruction as unaligned and issuing the instruction to a scheduler, dispatching the unaligned load instruction in parallel to first and second load pipelines, determining corresponding addresses for both load pipelines to retrieve data from first and second cache lines incorporating the target load data, and merging the data retrieved from both load pipelines. Prediction may be based on matching an instruction pointer of a previous iteration of the load instruction that was qualified as actually unaligned. Prediction may be further based on using a last address and a skip stride to predict a data stride between consecutive iterations of the load instruction. The addresses for both loads are selected to incorporate the target load data.
    Type: Grant
    Filed: December 8, 2015
    Date of Patent: June 4, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Qianli Di, Junjie Zhang
  • Patent number: 10140128
    Abstract: A parallelized multiple dispatch ordered queue including an ordered queue, qualify logic, ordered select logic, and dispatch logic. The ordered queue stores candidates in order from oldest to youngest into multiple entries. The ordered queue is divided into N groups in which an i'th group includes every i'th entry of every N entries of the ordered queue, wherein i is an integer less than or equal to N. The qualify logic determines whether any candidate is ready to be dispatched. The ordered select logic respectively determines the oldest candidate in each group that is ready to be dispatched. The dispatch logic dispatches the oldest ready candidates in parallel. The shift logic shifts the stored candidates in the ordered queue to fill any vacant entries between remaining ones of the stored candidates without changing an order of the remaining ones of the stored candidates in the ordered queue. The ordered queue may have any size or depth and N is any suitable integer determining the number of candidates (e.g.
    Type: Grant
    Filed: March 10, 2015
    Date of Patent: November 27, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Qianli Di, Jianbin Wang, Weili Li, Xiaoyuan Yu, Xin Yu Gao
  • Publication number: 20180307492
    Abstract: A processor that reduces pipeline stall including a front end, a load queue, a scheduler, and a load buffer. The front end issues instructions while a first full indication is not provided, but otherwise stalls issuing instructions. The load queue stores issued load instruction entries including information needed to execute the issued load instruction. The load queue provides a second full indication when full. The scheduler dispatches issued instructions for execution except for stalled load instructions, such as when not yet been stored in the load queue. The load buffer transfers issued load instructions to the load queue when the load queue is not full. When the load queue is full, the load buffer temporarily buffers issued load instructions until the load queue is no longer full. The load buffer allows more accurate load queue full determination, and allows processing to continue even when the load queue is full.
    Type: Application
    Filed: November 13, 2017
    Publication date: October 25, 2018
    Inventor: Qianli DI
  • Publication number: 20180307608
    Abstract: A cache memory for a processor including an arbiter, a tag array and a request queue. The arbiter arbitrates among multiple memory access requests and provides a selected memory access request. The tag array has a first read port receiving the selected memory access request and has a second read port receiving a prefetch request from a prefetcher. The tag array makes a hit or miss determination of whether data requested by the selected memory access request or the prefetch request is stored in a corresponding data array. The request queue has a first write port for receiving the selected memory access request when it misses in the tag array, and has a second write port for receiving the prefetch request when it misses in the tag array. The additional read and write ports provide a separate and independent pipeline path for handing prefetch requests.
    Type: Application
    Filed: November 13, 2017
    Publication date: October 25, 2018
    Inventors: Qianli DI, Weili LI
  • Publication number: 20180300134
    Abstract: A processor that is capable of executing cache line unaligned load instructions includes a scheduler, a memory execution unit, and a merge unit. When the memory execution unit detects an unaligned load dispatched by the scheduler, it stalls the scheduler and inserts a second load instruction into the memory execution unit after the unaligned load instruction. Execution of the unaligned load returns first partial data from a first cache line, and execution of the second load instruction returns second partial data from the next sequential cache line. The merge unit merges the partial data to provide result data to the next pipeline stage. The scheduler may be stalled for only one cycle sufficient to insert the second load instruction just after the unaligned load instruction.
    Type: Application
    Filed: November 13, 2017
    Publication date: October 18, 2018
    Inventor: Qianli DI
  • Patent number: 9928070
    Abstract: A microprocessor with a fused reservation stations (RS) structure including a primary RS, a secondary RS, and a bypass system. The primary RS has an input for receiving issued instructions, has a push output for pushing the issued instructions to the secondary RS, and has at least one bypass output for dispatching issued instructions that are ready for dispatch. The secondary RS has an input coupled to the push output of the primary RS and has at least one dispatch output. The bypass system selects between the bypass output of the primary RS and at least one dispatch output of the secondary RS for dispatching selected issued instructions. The primary and secondary RS may each be selected from different RS structure types. A unify RS provides a suitable primary RS, and the secondary RS may include multiple queues. The bypass output enables direct dispatch from the primary RS.
    Type: Grant
    Filed: October 14, 2015
    Date of Patent: March 27, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Qianli Di, Xiaoyuan Yu
  • Publication number: 20170139718
    Abstract: A system and method of performing speculative parallel execution of a cache line unaligned load instruction including speculatively predicting whether a load instruction is unaligned with a cache memory, marking the load instruction as unaligned and issuing the instruction to a scheduler, dispatching the unaligned load instruction in parallel to first and second load pipelines, determining corresponding addresses for both load pipelines to retrieve data from first and second cache lines incorporating the target load data, and merging the data retrieved from both load pipelines. Prediction may be based on matching an instruction pointer of a previous iteration of the load instruction that was qualified as actually unaligned. Prediction may be further based on using a last address and a skip stride to predict a data stride between consecutive iterations of the load instruction. The addresses for both loads are selected to incorporate the target load data.
    Type: Application
    Filed: December 8, 2015
    Publication date: May 18, 2017
    Inventors: QIANLI DI, JUNJIE ZHANG
  • Publication number: 20170090934
    Abstract: A microprocessor with a fused reservation stations (RS) structure including a primary RS, a secondary RS, and a bypass system. The primary RS has an input for receiving issued instructions, has a push output for pushing the issued instructions to the secondary RS, and has at least one bypass output for dispatching issued instructions that are ready for dispatch. The secondary RS has an input coupled to the push output of the primary RS and has at least one dispatch output. The bypass system selects between the bypass output of the primary RS and at least one dispatch output of the secondary RS for dispatching selected issued instructions. The primary and secondary RS may each be selected from different RS structure types. A unify RS provides a suitable primary RS, and the secondary RS may include multiple queues. The bypass output enables direct dispatch from the primary RS.
    Type: Application
    Filed: October 14, 2015
    Publication date: March 30, 2017
    Inventors: QIANLI DI, XIAOYUAN YU
  • Publication number: 20160328237
    Abstract: A load-store collision detection system for a speculative out of order processing engine which includes a scheduler that dispatches instructions to multiple instruction pipelines. The instruction pipelines include a load pipeline that provides a load valid signal when a speculatively dispatched load instruction is executing. The load-store collision detection system includes comparator logic, broadcast logic, and kill logic. The comparator logic asserts a clear signal when a virtual address of the speculatively dispatched load instruction matches at least one store instruction virtual address of a previously dispatched store instruction whose corresponding store data is not ready yet. The broadcast logic broadcasts the load valid signal to the scheduler to enable dispatch of any instructions dependent upon the speculatively dispatched load instruction. The kill logic invalidates the load valid signal when the clear signal is asserted to avoid a load-store collision that reduces processing performance.
    Type: Application
    Filed: May 22, 2015
    Publication date: November 10, 2016
    Inventors: QIANLI DI, JIANBIN WANG, XIN YU GAO
  • Publication number: 20160259648
    Abstract: A parallelized multiple dispatch ordered queue including an ordered queue, qualify logic, ordered select logic, and dispatch logic. The ordered queue stores candidates in order from oldest to youngest into multiple entries. The ordered queue is divided into N groups in which an i'th group includes every i'th entry of every N entries of the ordered queue, wherein i is an integer less than or equal to N. The qualify logic determines whether any candidate is ready to be dispatched. The ordered select logic respectively determines the oldest candidate in each group that is ready to be dispatched. The dispatch logic dispatches the oldest ready candidates in parallel. The shift logic shifts the stored candidates in the ordered queue to fill any vacant entries between remaining ones of the stored candidates without changing an order of the remaining ones of the stored candidates in the ordered queue. The ordered queue may have any size or depth and N is any suitable integer determining the number of candidates (e.g.
    Type: Application
    Filed: March 10, 2015
    Publication date: September 8, 2016
    Inventors: QIANLI DI, JIANBIN WANG, WEILI LI, XIAOYUAN YU, XIN YU GAO