Patents by Inventor Qifan Zhang

Qifan Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9535698
    Abstract: A clock-less asynchronous processing circuit or system having a plurality of pipelined processing stages utilizes self-clocked generators to tune the delay needed in each of the processing stages to complete the processing cycle. Because different processing stages may require different amounts of time to complete processing or may require different delays depending on the processing required in a particular stage, the self-clocked generators may be tuned to each stage's necessary delay(s) or may be programmably configured.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: January 3, 2017
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Tao Huang, Qifan Zhang, Wuxian Shi, Yiqun Ge, Wen Tong
  • Patent number: 9495316
    Abstract: Embodiments are provided for an asynchronous processor with a Hierarchical Token System. The asynchronous processor includes a set of primary processing units configured to gate and pass a set of tokens in a predefined order of a primary token system. The asynchronous processor further includes a set of secondary units configured to gate and pass a second set of tokens in a second predefined order of a secondary token system. The set of tokens of the primary token system includes a token consumed in the set of primary processing units and designated for triggering the secondary token system in the set of secondary units.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: November 15, 2016
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Patent number: 9489200
    Abstract: A clock-less asynchronous processing circuit or system is configured to operation in a plurality of modes. In an initialization mode (e.g., reset, initialization, boot up), a self-clocked generator associated with the asynchronous circuit is configured to generate an active complete signal (to latch output processed data) within a first period of time after receiving a trigger signal. In a normal mode, the self-clocked generator is configured to generate the active complete signal within a second period of time after receiving the trigger signal. In one embodiment, during the initialization mode, the asynchronous circuit latches the output slower than when in the normal mode.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: November 8, 2016
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Tao Huang, Qifan Zhang, Wuxian Shi, Yiqun Ge, Wen Tong
  • Patent number: 9325520
    Abstract: Embodiments are provided for adding a token jump logic to an asynchronous processor with token passing. The token jump logic allows token forward jumps and token backward jumps over a cascade of token processing logics in the processor. An embodiment method includes determining, using a token jump logic coupled to a cascade of token processing logics, whether to administer a token forward jump or a token backward jump of a token signal passing through the token processing logics. The token forward jump and token backward jump allow the token signal to skip one or more token processing logics in the cascade. The method further includes monitoring, for each of the token processing logics, a polarity status of a token sense logic, and inverting the polarity status according to the determination at the token jump logic.
    Type: Grant
    Filed: July 7, 2014
    Date of Patent: April 26, 2016
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Wuxian Shi, Yiqun Ge, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150082006
    Abstract: Embodiments are provided for an asynchronous processor with an asynchronous Instruction fetch, decode, and issue unit. The asynchronous processor comprises an execution unit for asynchronous execution of a plurality of instructions, and a fetch, decode and issue unit configured for asynchronous decoding of the instructions. The fetch, decode and issue unit comprises a plurality of resources supporting functions of the fetch, decode and issue unit, and a plurality of decoders arranged in a predefined order for passing a plurality of tokens. The tokens control access of the decoders to the resources and allow the decoders exclusive access to the resources.
    Type: Application
    Filed: September 4, 2014
    Publication date: March 19, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074382
    Abstract: Embodiments are provided for adding a token jump logic to an asynchronous processor with token passing. The token jump logic allows token forward jumps and token backward jumps over a cascade of token processing logics in the processor. An embodiment method includes determining, using a token jump logic coupled to a cascade of token processing logics, whether to administer a token forward jump or a token backward jump of a token signal passing through the token processing logics. The token forward jump and token backward jump allow the token signal to skip one or more token processing logics in the cascade. The method further includes monitoring, for each of the token processing logics, a polarity status of a token sense logic, and inverting the polarity status according to the determination at the token jump logic.
    Type: Application
    Filed: July 7, 2014
    Publication date: March 12, 2015
    Inventors: Wuxian Shi, Yiqun Ge, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074378
    Abstract: Embodiments are provided for an asynchronous processor with heterogeneous processors. In an embodiment, the apparatus for an asynchronous processor comprises a memory configured to cache instructions, and a first unit (XU) configured to processing a first instruction of the instructions. The apparatus also comprises a second XU having less restricted access than the first XU to a resource of the asynchronous processor and configured to process a second instruction of the instructions. The second instruction requires access to the resource. The apparatus further comprises a feedback engine configured to decode the first instruction and the second instruction, and issue the first instruction to the first XU, and a scheduler configured to send the second instruction to the second XU.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074445
    Abstract: A clock-less asynchronous processing circuit or system having a plurality of pipelined processing stages utilizes self-clocked generators to tune the delay needed in each of the processing stages to complete the processing cycle. Because different processing stages may require different amounts of time to complete processing or may require different delays depending on the processing required in a particular stage, the self-clocked generators may be tuned to each stage's necessary delay(s) or may be programmably configured.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Tao Huang, Qifan Zhang, Wuxian Shi, Yiqun Ge, Wen Tong
  • Publication number: 20150074376
    Abstract: Embodiments are provided for an asynchronous processor using master and assisted tokens. In an embodiment, an apparatus for an asynchronous processor comprises a memory to cache a plurality of instructions, a feedback engine to decode the instructions from the memory, and a plurality of XUs coupled to the feedback engine and arranged in a token ring architecture. Each one of the XUs is configured to receive an instruction of the instructions form the feedback engine, and receive a master token associated with a resource and further receive an assisted token for the master token. Upon determining that the assisted token and the master token are received in an abnormal order, the XU is configured to detect an operation status for the instruction in association with the assisted token, and upon determining a needed action in accordance with the operation status and the assisted token, perform the needed action.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074380
    Abstract: A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Tao Huang, Yiqun Ge, Qifan Zhang, Wuxian Shi, Wen Tong
  • Publication number: 20150074787
    Abstract: Embodiments are provided for an asynchronous processor with a Hierarchical Token System. The asynchronous processor includes a set of primary processing units configured to gate and pass a set of tokens in a predefined order of a primary token system. The asynchronous processor further includes a set of secondary units configured to gate and pass a second set of tokens in a second predefined order of a secondary token system. The set of tokens of the primary token system includes a token consumed in the set of primary processing units and designated for triggering the secondary token system in the set of secondary units.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074446
    Abstract: A clock-less asynchronous processing circuit or system utilizes a self-clocked generator to adjust the processing delay (latency) needed/allowed to the processing cycle in the circuit/system. The timing of the self-clocked generator is dynamically adjustable depending on various parameters. These parameters may include processing instruction, opcode information, type of processing to be performed by the circuit/system, or overall desired processing performance. The latency may also be adjusted to change processing performance, including power consumption, speed etc.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Wen Tong, Yiqun Ge, Qifan Zhang, Wuxian Shi, Huang Tao
  • Publication number: 20150074443
    Abstract: A clock-less asynchronous processing circuit or system is configured to operation in a plurality of modes. In an initialization mode (e.g., reset, initialization, boot up), a self-clocked generator associated with the asynchronous circuit is configured to generate an active complete signal (to latch output processed data) within a first period of time after receiving a trigger signal. In a normal mode, the self-clocked generator is configured to generate the active complete signal within a second period of time after receiving the trigger signal. In one embodiment, during the initialization mode, the asynchronous circuit latches the output slower than when in the normal mode.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Tao Huang, Qifan Zhang, Wuxian Shi, Yiqun Ge, Wen Tong
  • Publication number: 20150074374
    Abstract: An asynchronous processing system comprising an asynchronous scalar processor and an asynchronous vector processor coupled to the scalar processor. The asynchronous scalar processor is configured to perform processing functions on input data and to output instructions. The asynchronous vector processor is configured to perform processing functions in response to a very long instruction word (VLIW) received from the scalar processor. The VLIW comprises a first portion and a second portion, at least the first portion comprising a vector instruction.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Qifan Zhang, Wuxian Shi, Yiqun Ge, Tao Huang, Wen Tong
  • Publication number: 20150074377
    Abstract: Embodiments are provided for an asynchronous processor with pipelined arithmetic and logic unit. The asynchronous processor includes a non-transitory memory for storing instructions and a plurality of instruction execution units (XUs) arranged in a ring architecture for passing tokens. Each one of the XUs comprises a logic circuit configured to fetch a first instruction from the non-transitory memory, and execute the first instruction. The logic circuit is also configured to fetch a second instruction from the non-transitory memory, and execute the second instruction, regardless whether the one of the XUs holds a token for writing the first instruction. The logic circuit is further configured to write the first instruction to the non-transitory memory after fetching the second instruction.
    Type: Application
    Filed: September 4, 2014
    Publication date: March 12, 2015
    Inventors: Wuxian Shi, Yiqun Ge, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074379
    Abstract: Embodiments are provided for an asynchronous processor with token-based very long instruction word architecture. The asynchronous processor comprises a memory configured to cache a plurality of instructions, a feedback engine configured to receive the instructions in bundles of instructions at a time (referred to as very long instruction word) and to decode the instructions, and a crossbar bus configured to transfer calculation information and results of the asynchronous processor. The apparatus further comprises a plurality of sets of execution units (XUs) between the feedback engine and the crossbar bus. Each set of the sets of XUs comprises a plurality of XUs arranged in series and configured to process a bundle of instructions received at the each set from the feedback engine.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20150074680
    Abstract: A method of operating a clock-less asynchronous processing system comprising a plurality of successive asynchronous processing components. The method comprises providing a first token signal path in the plurality of processing components to allow propagation of a token through the processing components. Possession of the token by one of the processing components enables the processing component to conduct a transaction with a resource component that is shared among the processing components. The method comprises propagating the token from one processing component to another processing component along the token signal path.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Qifan Zhang, Yiqun Ge, Wuxian Shi, Tao Huang, Wen Tong
  • Publication number: 20150074353
    Abstract: Embodiments are provided for an asynchronous processor with multiple threading. The asynchronous processor includes a program counter (PC) logic and instruction cache unit comprising a plurality of PC logics configured to perform branch prediction and loop predication for a plurality of threads of instructions, and determine target PC addresses for caching the plurality of threads. The processor further comprises an instruction memory configured to cache the plurality of threads in accordance with the target PC addresses from the PC logic and instruction cache unit. The processor further includes a multi-threading (MT) scheduling unit configured to schedule and merge instruction flows for the plurality of threads from the instruction memory into a single combined thread of instructions. Additionally, a MT register window register is included to map operands in the plurality of threads to a plurality of corresponding register windows in a register file.
    Type: Application
    Filed: September 3, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Publication number: 20140247908
    Abstract: A method and a system is provided for Coordinate Rotation Digital Computer (CORDIC) based matrix inversion of input digital signal streams from multiple antennas using an bi-directional ring-bus architecture. The bi-directional ring bus includes a first ring bus having signals flow in a clockwise direction, and a second ring bus having signals flow in a counter-clockwise direction. An I/O controller is coupled to the first and the second ring bus, respectively. A plurality of processing elements (PEs), where each of the plurality of PEs is coupled to the first and the second ring bus, respectively, wherein each of the plurality of PEs includes at least one CORDIC core for performing CORDIC iterations on the plurality of input digital stream signals to produce inversed matrix signals.
    Type: Application
    Filed: March 1, 2013
    Publication date: September 4, 2014
    Applicant: Futurewei Technologies, Inc.
    Inventors: Yiqun Ge, Qifan Zhang, Peter Man Kin Sinn
  • Patent number: 8824603
    Abstract: A method and a system is provided for Coordinate Rotation Digital Computer (CORDIC) based matrix inversion of input digital signal streams from multiple antennas using an bi-directional ring-bus architecture. The bi-directional ring bus includes a first ring bus having signals flow in a clockwise direction, and a second ring bus having signals flow in a counter-clockwise direction. An I/O controller is coupled to the first and the second ring bus, respectively. A plurality of processing elements (PEs), where each of the plurality of PEs is coupled to the first and the second ring bus, respectively, wherein each of the plurality of PEs includes at least one CORDIC core for performing CORDIC iterations on the plurality of input digital stream signals to produce inversed matrix signals.
    Type: Grant
    Filed: March 1, 2013
    Date of Patent: September 2, 2014
    Assignee: Futurewei Technologies, Inc.
    Inventors: Yiqun Ge, Qifan Zhang, Peter Man Kin Sinn