Patents by Inventor QIANLI DI

QIANLI DI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processor cache with independent pipeline to expedite prefetch request

Patent number: 10713172

Abstract: A cache memory for a processor including an arbiter, a tag array and a request queue. The arbiter arbitrates among multiple memory access requests and provides a selected memory access request. The tag array has a first read port receiving the selected memory access request and has a second read port receiving a prefetch request from a prefetcher. The tag array makes a hit or miss determination of whether data requested by the selected memory access request or the prefetch request is stored in a corresponding data array. The request queue has a first write port for receiving the selected memory access request when it misses in the tag array, and has a second write port for receiving the prefetch request when it misses in the tag array. The additional read and write ports provide a separate and independent pipeline path for handing prefetch requests.

Type: Grant

Filed: November 13, 2017

Date of Patent: July 14, 2020

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventors: Qianli Di, Weili Li
System and method of reducing processor pipeline stall caused by full load queue

Patent number: 10437599

Abstract: A processor that reduces pipeline stall including a front end, a load queue, a scheduler, and a load buffer. The front end issues instructions while a first full indication is not provided, but otherwise stalls issuing instructions. The load queue stores issued load instruction entries including information needed to execute the issued load instruction. The load queue provides a second full indication when full. The scheduler dispatches issued instructions for execution except for stalled load instructions, such as when not yet been stored in the load queue. The load buffer transfers issued load instructions to the load queue when the load queue is not full. When the load queue is full, the load buffer temporarily buffers issued load instructions until the load queue is no longer full. The load buffer allows more accurate load queue full determination, and allows processing to continue even when the load queue is full.

Type: Grant

Filed: November 13, 2017

Date of Patent: October 8, 2019

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventor: Qianli Di
System and method of speculative parallel execution of cache line unaligned load instructions

Patent number: 10310859

Abstract: A system and method of performing speculative parallel execution of a cache line unaligned load instruction including speculatively predicting whether a load instruction is unaligned with a cache memory, marking the load instruction as unaligned and issuing the instruction to a scheduler, dispatching the unaligned load instruction in parallel to first and second load pipelines, determining corresponding addresses for both load pipelines to retrieve data from first and second cache lines incorporating the target load data, and merging the data retrieved from both load pipelines. Prediction may be based on matching an instruction pointer of a previous iteration of the load instruction that was qualified as actually unaligned. Prediction may be further based on using a last address and a skip stride to predict a data stride between consecutive iterations of the load instruction. The addresses for both loads are selected to incorporate the target load data.

Type: Grant

Filed: December 8, 2015

Date of Patent: June 4, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Qianli Di, Junjie Zhang
Parallelized multiple dispatch system and method for ordered queue arbitration

Patent number: 10140128

Abstract: A parallelized multiple dispatch ordered queue including an ordered queue, qualify logic, ordered select logic, and dispatch logic. The ordered queue stores candidates in order from oldest to youngest into multiple entries. The ordered queue is divided into N groups in which an i'th group includes every i'th entry of every N entries of the ordered queue, wherein i is an integer less than or equal to N. The qualify logic determines whether any candidate is ready to be dispatched. The ordered select logic respectively determines the oldest candidate in each group that is ready to be dispatched. The dispatch logic dispatches the oldest ready candidates in parallel. The shift logic shifts the stored candidates in the ordered queue to fill any vacant entries between remaining ones of the stored candidates without changing an order of the remaining ones of the stored candidates in the ordered queue. The ordered queue may have any size or depth and N is any suitable integer determining the number of candidates (e.g.

Type: Grant

Filed: March 10, 2015

Date of Patent: November 27, 2018

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Qianli Di, Jianbin Wang, Weili Li, Xiaoyuan Yu, Xin Yu Gao
SYSTEM AND METHOD OF REDUCING PROCESSOR PIPELINE STALL CAUSED BY FULL LOAD QUEUE

Publication number: 20180307492

Abstract: A processor that reduces pipeline stall including a front end, a load queue, a scheduler, and a load buffer. The front end issues instructions while a first full indication is not provided, but otherwise stalls issuing instructions. The load queue stores issued load instruction entries including information needed to execute the issued load instruction. The load queue provides a second full indication when full. The scheduler dispatches issued instructions for execution except for stalled load instructions, such as when not yet been stored in the load queue. The load buffer transfers issued load instructions to the load queue when the load queue is not full. When the load queue is full, the load buffer temporarily buffers issued load instructions until the load queue is no longer full. The load buffer allows more accurate load queue full determination, and allows processing to continue even when the load queue is full.

Type: Application

Filed: November 13, 2017

Publication date: October 25, 2018

Inventor: Qianli DI
PROCESSOR CACHE WITH INDEPENDENT PIPELINE TO EXPEDITE PREFETCH REQUEST

Publication number: 20180307608

Abstract: A cache memory for a processor including an arbiter, a tag array and a request queue. The arbiter arbitrates among multiple memory access requests and provides a selected memory access request. The tag array has a first read port receiving the selected memory access request and has a second read port receiving a prefetch request from a prefetcher. The tag array makes a hit or miss determination of whether data requested by the selected memory access request or the prefetch request is stored in a corresponding data array. The request queue has a first write port for receiving the selected memory access request when it misses in the tag array, and has a second write port for receiving the prefetch request when it misses in the tag array. The additional read and write ports provide a separate and independent pipeline path for handing prefetch requests.

Type: Application

Filed: November 13, 2017

Publication date: October 25, 2018

Inventors: Qianli DI, Weili LI
SYSTEM AND METHOD OF EXECUTING CACHE LINE UNALIGNED LOAD INSTRUCTIONS

Publication number: 20180300134

Abstract: A processor that is capable of executing cache line unaligned load instructions includes a scheduler, a memory execution unit, and a merge unit. When the memory execution unit detects an unaligned load dispatched by the scheduler, it stalls the scheduler and inserts a second load instruction into the memory execution unit after the unaligned load instruction. Execution of the unaligned load returns first partial data from a first cache line, and execution of the second load instruction returns second partial data from the next sequential cache line. The merge unit merges the partial data to provide result data to the next pipeline stage. The scheduler may be stalled for only one cycle sufficient to insert the second load instruction just after the unaligned load instruction.

Type: Application

Filed: November 13, 2017

Publication date: October 18, 2018

Inventor: Qianli DI
Microprocessor with a reservation stations structure including primary and secondary reservation stations and a bypass system

Patent number: 9928070

Abstract: A microprocessor with a fused reservation stations (RS) structure including a primary RS, a secondary RS, and a bypass system. The primary RS has an input for receiving issued instructions, has a push output for pushing the issued instructions to the secondary RS, and has at least one bypass output for dispatching issued instructions that are ready for dispatch. The secondary RS has an input coupled to the push output of the primary RS and has at least one dispatch output. The bypass system selects between the bypass output of the primary RS and at least one dispatch output of the secondary RS for dispatching selected issued instructions. The primary and secondary RS may each be selected from different RS structure types. A unify RS provides a suitable primary RS, and the secondary RS may include multiple queues. The bypass output enables direct dispatch from the primary RS.

Type: Grant

Filed: October 14, 2015

Date of Patent: March 27, 2018

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD

Inventors: Qianli Di, Xiaoyuan Yu
SYSTEM AND METHOD OF SPECULATIVE PARALLEL EXECUTION OF CACHE LINE UNALIGNED LOAD INSTRUCTIONS

Publication number: 20170139718

Abstract: A system and method of performing speculative parallel execution of a cache line unaligned load instruction including speculatively predicting whether a load instruction is unaligned with a cache memory, marking the load instruction as unaligned and issuing the instruction to a scheduler, dispatching the unaligned load instruction in parallel to first and second load pipelines, determining corresponding addresses for both load pipelines to retrieve data from first and second cache lines incorporating the target load data, and merging the data retrieved from both load pipelines. Prediction may be based on matching an instruction pointer of a previous iteration of the load instruction that was qualified as actually unaligned. Prediction may be further based on using a last address and a skip stride to predict a data stride between consecutive iterations of the load instruction. The addresses for both loads are selected to incorporate the target load data.

Type: Application

Filed: December 8, 2015

Publication date: May 18, 2017

Inventors: QIANLI DI, JUNJIE ZHANG
MICROPROCESSOR WITH FUSED RESERVATION STATIONS STRUCTURE

Publication number: 20170090934

Abstract: A microprocessor with a fused reservation stations (RS) structure including a primary RS, a secondary RS, and a bypass system. The primary RS has an input for receiving issued instructions, has a push output for pushing the issued instructions to the secondary RS, and has at least one bypass output for dispatching issued instructions that are ready for dispatch. The secondary RS has an input coupled to the push output of the primary RS and has at least one dispatch output. The bypass system selects between the bypass output of the primary RS and at least one dispatch output of the secondary RS for dispatching selected issued instructions. The primary and secondary RS may each be selected from different RS structure types. A unify RS provides a suitable primary RS, and the secondary RS may include multiple queues. The bypass output enables direct dispatch from the primary RS.

Type: Application

Filed: October 14, 2015

Publication date: March 30, 2017

Inventors: QIANLI DI, XIAOYUAN YU
SYSTEM AND METHOD TO REDUCE LOAD-STORE COLLISION PENALTY IN SPECULATIVE OUT OF ORDER ENGINE

Publication number: 20160328237

Abstract: A load-store collision detection system for a speculative out of order processing engine which includes a scheduler that dispatches instructions to multiple instruction pipelines. The instruction pipelines include a load pipeline that provides a load valid signal when a speculatively dispatched load instruction is executing. The load-store collision detection system includes comparator logic, broadcast logic, and kill logic. The comparator logic asserts a clear signal when a virtual address of the speculatively dispatched load instruction matches at least one store instruction virtual address of a previously dispatched store instruction whose corresponding store data is not ready yet. The broadcast logic broadcasts the load valid signal to the scheduler to enable dispatch of any instructions dependent upon the speculatively dispatched load instruction. The kill logic invalidates the load valid signal when the clear signal is asserted to avoid a load-store collision that reduces processing performance.

Type: Application

Filed: May 22, 2015

Publication date: November 10, 2016

Inventors: QIANLI DI, JIANBIN WANG, XIN YU GAO
PARALLELIZED MULTIPLE DISPATCH SYSTEM AND METHOD FOR ORDERED QUEUE ARBITRATION

Publication number: 20160259648

Abstract: A parallelized multiple dispatch ordered queue including an ordered queue, qualify logic, ordered select logic, and dispatch logic. The ordered queue stores candidates in order from oldest to youngest into multiple entries. The ordered queue is divided into N groups in which an i'th group includes every i'th entry of every N entries of the ordered queue, wherein i is an integer less than or equal to N. The qualify logic determines whether any candidate is ready to be dispatched. The ordered select logic respectively determines the oldest candidate in each group that is ready to be dispatched. The dispatch logic dispatches the oldest ready candidates in parallel. The shift logic shifts the stored candidates in the ordered queue to fill any vacant entries between remaining ones of the stored candidates without changing an order of the remaining ones of the stored candidates in the ordered queue. The ordered queue may have any size or depth and N is any suitable integer determining the number of candidates (e.g.

Type: Application

Filed: March 10, 2015

Publication date: September 8, 2016

Inventors: QIANLI DI, JIANBIN WANG, WEILI LI, XIAOYUAN YU, XIN YU GAO