Patents by Inventor Kar Lik Wong

Kar Lik Wong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8879636
    Abstract: Methods and apparatus for adaptive encoding of data such as for example video data. In one exemplary embodiment, a real-time video encoder is disclosed that changes video encoding processes to produce the best quality encoded video while maintaining a target encoding frame rate, according to one or more operating constraints.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: November 4, 2014
    Assignee: Synopsys, Inc.
    Inventors: Carl Norman Graham, Seow Chuan Lim, Aris Aristodemou, John R. M. Mason, Tim Hall, Yazid Nemouchi, Kar-Lik Wong
  • Patent number: 8212823
    Abstract: A data path for a SIMD-based microprocessor is used to perform different simultaneous filter sub-operations in parallel data lanes of the SIMD-based microprocessor. Filter operations for sub-pixel interpolation are performed simultaneously on separate lanes of the SIMD processor's data path. Using a dedicated internal data path, precision higher than the native precision of the SIMD unit may be achieved. Through the data path according to this invention, a single instruction may be used to generate the value of two adjacent sub-pixels located diagonally with respect to integer pixel positions.
    Type: Grant
    Filed: September 28, 2006
    Date of Patent: July 3, 2012
    Assignee: Synopsys, Inc.
    Inventors: Carl Norman Graham, Kar-Lik Wong, Simon Jones, Aris Aristodemou
  • Patent number: 8006069
    Abstract: Inter-processor communication systems and methods that define within the instruction set of the microprocessor a command for directing the microprocessor to relinquish control over at least one of the microprocessor's internal registers. The microprocessor may then signal a communication interface that collects data from external sources. The communication interface takes control over the internal register released by the microprocessor and inputs the collected external data directly into the internal register of the microprocessor. Once data is place into the internal register, control of that register may be returned to the microprocessor.
    Type: Grant
    Filed: October 5, 2007
    Date of Patent: August 23, 2011
    Assignee: Synopsys, Inc.
    Inventors: Simon Jones, Carl Norman Graham, Kar-Lik Wong
  • Patent number: 7971042
    Abstract: Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline. A record instruction including a record start address is sent to the extended pipeline. The extended pipeline thus begins recording the subsequent instruction sequence at the specified address until an end record instruction is encountered. The end record instruction is recorded as the last instruction in the sequence. The main pipeline may then call the instruction sequence by sending a run instruction including the start address for the desired sequence to the extended pipeline. This run instruction causes the extended pipeline to begin autonomously executing the recorded sequence until the end record instruction is encountered. This instruction causes the extended pipeline to cease autonomous execution and to return to executing instructions supplied by the main pipeline.
    Type: Grant
    Filed: September 28, 2006
    Date of Patent: June 28, 2011
    Assignee: Synopsys, Inc.
    Inventors: Carl Norman Graham, Simon Jones, Seow Chuan Lim, Yazid Nemouchi, Kar-Lik Wong, Aris Aristodemou
  • Patent number: 7747088
    Abstract: Two pairs of deblock instructions for performing deblock filtering on a horizontal row of pixels according to the H.264 (MPEG 4 part 10) and VC1 video codec algorithms. The first instruction of each pair has three 128-bit operands comprising the 16-bit components of a horizontal line of 8 pixels crossing a vertical block edge between pixels 4 and 5 in a YUV image, a series of filter threshold parameters, and a 128-bit destination operand for storing the output of the first instruction. The second instruction of each pair accepts the same 16-bit components as its first input, the output of the first instruction as its second input and a destination operand for storing an output of the second instruction as its third input. The instruction pairs are intended for use with the H.264 or VC1 video codecs respectively.
    Type: Grant
    Filed: September 28, 2006
    Date of Patent: June 29, 2010
    Assignee: ARC International (UK) Limited
    Inventors: Carl Norman Graham, Kar-Lik Wong, Simon Jones, Aris Aristodemou, Yazid Nemouchi
  • Publication number: 20080291995
    Abstract: Methods and apparatus for adaptive encoding of data such as for example video data. In one exemplary embodiment, a real-time video encoder is disclosed that changes video encoding processes to produce the best quality encoded video whilst maintaining a target encoding frame rate, according to one or more operating constraints.
    Type: Application
    Filed: May 27, 2008
    Publication date: November 27, 2008
    Inventors: Carl Norman Graham, Seow Chuan Lim, Aris Aristodemou, John R.M. Mason, Tim Hall, Yazid Nemouchi, Kar-Lik Wong
  • Publication number: 20080086626
    Abstract: Inter-processor communication systems and methods that define within the instruction set of the microprocessor a command for directing the microprocessor to relinquish control over at least one of the microprocessor's internal registers. The microprocessor may then signal a communication interface that collects data from external sources. The communication interface takes control over the internal register released by the microprocessor and inputs the collected external data directly into the internal register of the microprocessor. Once data is place into the internal register, control of that register may be returned to the microprocessor.
    Type: Application
    Filed: October 5, 2007
    Publication date: April 10, 2008
    Inventors: Simon Jones, Carl Norman Graham, Kar-Lik Wong
  • Publication number: 20070250689
    Abstract: Methods and apparatus adapted for enhancing the throughput of a digital processor (e.g., microprocessor, CISC device, or RISC device) through use of a direct memory access (DMA) mechanism. In one embodiment, the processor comprises a “soft” RISC-based processor core that is both user-extensible and user-configurable. The core comprises a functional process or unit (DMA assist) that is coupled to the processor's extension logic and which facilitates throughput by, among other things, ensuring that the CPU and processor extension logic can operate on data in parallel in an efficient manner. In one variant, a parallel datapath (including a buffer) is used in conjunction with the aforementioned DMA assist so as to permit the processor extension logic to efficiently operate in parallel with the CPU.
    Type: Application
    Filed: March 22, 2007
    Publication date: October 25, 2007
    Inventors: Aris Aristodemou, Amnon Cohen, Kar-Lik Wong, Ryan Lim, Simon Jones
  • Publication number: 20070074004
    Abstract: Systems and methods for selectively decoupling a parallel extended processor pipeline. A main processor pipeline and parallel extended pipeline are coupled via an instruction queue. The main pipeline can instruct the parallel pipeline to execute instructions directly or to begin fetching and executing its own instructions autonomously. During autonomous operation of the parallel pipeline, instructions from the main pipeline accumulate in the instruction queue. The parallel pipeline can return to main pipeline controlled execution through a single instruction. A light weight mechanism in the form of a condition code as seen by the main processor is designed to allow intelligent decision maximizing overall performance to be made in run-time if further instructions should be issued to the parallel extended pipeline based on the queue status.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Kar-Lik Wong, Carl Graham, Seow Lim, Simon Jones, Yazid Nemouchi, Aris Aristodemou
  • Publication number: 20070073925
    Abstract: Systems and methods for synchronizing multiple processing engines of a microprocessor. In a microprocessor engine employing processor extension logic, DMA engines are used to permit the processor extension logic to move data into and out of local memory independent of the main instruction pipeline. Synchronization between the extended instruction pipeline and DMA engines is performed to maximize simultaneous operation of these elements. The DMA engines includes a data-in and data-out engine each adapted to buffer at least one instruction in a queue. If, for each DMA engine, the queue is full and a new instruction is trying to enter the buffer, the DMA engine will cause the extended pipeline to pause execution until the current DMA operation is complete. This prevents data overwrites while maximizing simultaneous operation.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Seow Lim, Carl Graham, Kar-Lik Wong, Simon Jones, Aris Aristodemou
  • Publication number: 20070070080
    Abstract: A data path for a SIMD-based microprocessor is used to perform different simultaneous filter sub-operations in parallel data lanes of the SIMD-based microprocessor. Filter operations for sub-pixel interpolation are performed simultaneously on separate lanes of the SIMD processor's data path. Using a dedicated internal data path, precision higher than the native precision of the SIMD unit may be achieved. Through the data path according to this invention, a single instruction may be used to generate the value of two adjacent sub-pixels located diagonally with respect to integer pixel positions.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Carl Graham, Kar-Lik Wong, Simon Jones, Aris Aristodemou
  • Publication number: 20070071106
    Abstract: Two pairs of deblock instructions for performing deblock filtering on a horizontal row of pixels according to the H.264 (MPEG 4 part 10) and VC1 video codec algorithms. The first instruction of each pair has three 128-bit operands comprising the 16-bit components of a horizontal line of 8 pixels crossing a vertical block edge between pixels 4 and 5 in a YUV image, a series of filter threshold parameters, and a 128-bit destination operand for storing the output of the first instruction. The second instruction of each pair accepts the same 16-bit components as its first input, the output of the first instruction as its second input and a destination operand for storing an output of the second instruction as its third input. The instruction pairs are intended for use with the H.264 or VC1 video codecs respectively.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Carl Graham, Kar-Lik Wong, Simon Jones, Aris Aristodemou, Yazid Nemouchi
  • Publication number: 20070074012
    Abstract: Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline. A record instruction including a record start address is sent to the extended pipeline. The extended pipeline thus begins recording the subsequent instruction sequence at the specified address until an end record instruction is encountered. The end record instruction is recorded as the last instruction in the sequence. The main pipeline may then call the instruction sequence by sending a run instruction including the start address for the desired sequence to the extended pipeline. This run instruction causes the extended pipeline to begin autonomously executing the recorded sequence until the end record instruction is encountered. This instruction causes the extended pipeline to cease autonomous execution and to return to executing instructions supplied by the main pipeline.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Carl Graham, Simon Jones, Seow Lim, Yazid Nemouchi, Kar-Lik Wong, Aris Aristodemou
  • Publication number: 20070074007
    Abstract: A parameterizable clip instruction for SIMD microprocessor architecture and method of performing a clip operating the same. A single instruction is provided with three input operands: a destination address, a source address and a controlling parameter. The controlling parameter includes a range type and a range specifier. The range type is a multi-bit integer in the operand that is used to index a table of range types. The range specifier plugs into the range type to define a range. The data input at the source address is clipped according to the controlling parameters. The instruction is particularly suited to video encoding/decoding applications where interpolations or other calculations, lies outside the maximum value and that final result will have to be clipped to saturation value, for example, the maximum pixel value. Signed and unsigned clipping ranges may be used that are not only powers of two.
    Type: Application
    Filed: September 28, 2006
    Publication date: March 29, 2007
    Inventors: Nigel Topham, Yazid Nemouchi, Simon Jones, Carl Graham, Kar-Lik Wong, Aris Aristodemou
  • Publication number: 20050289323
    Abstract: A 2N bit right only barrel shifter for a microprocessor comprising upper and lower N bit shifter portions. A N bit input is put in the upper portion. An X bit right shift of the N bit number yields the results in the N bit upper portion and the result of an N-X bit left shift in the lower portion. The N bit shifter is comprised of a Log2N stage multiplexer where in each successive stage of the multiplexer adds 2x additional bits where x increments from 0 to (Log2N-1).
    Type: Application
    Filed: May 19, 2005
    Publication date: December 29, 2005
    Inventors: Kar-Lik Wong, Nigel Topham
  • Publication number: 20050278513
    Abstract: A hybrid branch prediction scheme for a multi-stage pipelined microprocessor that combines features of static and dynamic branch prediction to reduce complexity and enhance performance over conventional branch prediction techniques. Prior to microprocessor deployment, a branch prediction table is populated using static branch prediction techniques by executing instructions analogous to those to be executed during microprocessor deployment. The branch prediction table is stored, and then loaded into the BPU during deployment, for example, at the time of microprocessor power on. Dynamic branch prediction is then performed using the pre-loaded data, thereby enabling dynamic branch prediction with a required “warm-up” period. After resolving each branch in the selection stage of the microprocessor instruction pipeline, the BPU is updated with the address of the next instruction that resulted from that branch to enhance performance.
    Type: Application
    Filed: May 19, 2005
    Publication date: December 15, 2005
    Inventors: Aris Aristodemou, Rich Fuhler, Kar-Lik Wong
  • Publication number: 20050278517
    Abstract: A method of performing branch prediction in a microprocessor using variable length instructions is provided. An instruction is fetched from memory based on a specified fetch address and a branch prediction is made based on the address. The prediction is selectively discarded if the look-up was based on a non-sequential fetch to an unaligned instruction address and a branch target alignment cache (BTAC) bit of the instruction is equal to zero. In order to remove the inherent latency of branch prediction, an instruction prior to a branch instruction may be fetched concurrently with a branch prediction unit look-up table entry containing prediction information for a next instruction word. Then, the branch instruction is fetched and a prediction is made on this branch instruction based on information fetched in the previous cycle. The predicted target instruction is fetched on the next clock cycle.
    Type: Application
    Filed: May 19, 2005
    Publication date: December 15, 2005
    Inventors: Kar-Lik Wong, James Hakewill, Nigel Topham, Rich Fuhler
  • Publication number: 20050278505
    Abstract: A microprocessor architecture including a predictive pre-fetch XY memory pipeline in parallel to the processor's pipeline for processing compound instructions with enhanced processor performance through predictive prefetch techniques. Instruction operands are predictively prefetched from X and Y based on the historical use of operands in instructions that target X and Y memory. After the compound instruction is decoded in the pipeline, the pre-fetched operand pointer, address and data is reconciled with the operands contained in the actual instruction. If the actual data has been pre-fetched, it is passed to the appropriate execute unit in the execute stage of the processor pipeline. As a result, if the prediction is correct, the data to use for access can be selected and the data selected fed to the execution stage without any addition processor overhead. This pre-fetch mechanism avoids the need to slow down the clock speed of the processor or insert stalls for each compound instruction when using XY memory.
    Type: Application
    Filed: May 19, 2005
    Publication date: December 15, 2005
    Inventors: Seow Lim, Kar-Lik Wong
  • Publication number: 20050273559
    Abstract: A microprocessor architecture including a unified cache debug unit. A debug unit on the processor chip receives data/command signals from a unit of the execute stage of the multi-stage instruction pipeline of the processor and returns information to the execute stage unit. The cache debug unit is operatively connected to both instruction and data cache units of the microprocessor. The memory subsystem of the processor may be accessed by the cache debug unit through either of the instruction or data cache units. By unifying the cache debug in a separate structure, the need for redundant debug structure in both cache units is obviated. Also, the unified cache debug unit can be powered down when not accessed by the instruction pipeline, thereby saving power.
    Type: Application
    Filed: May 19, 2005
    Publication date: December 8, 2005
    Inventors: Aris Aristodemou, Daniel Hansson, Morgyn Taylor, Kar-Lik Wong
  • Patent number: 6546409
    Abstract: A digital processor and method for performing mathematical division in which performance degradation is mitigated by avoiding left shift and append (14) on the output of an ALU using pre-shift and append (18, 22) of the feedback from the quotient and remainder storage element (R, Q).
    Type: Grant
    Filed: June 9, 1999
    Date of Patent: April 8, 2003
    Assignee: LSI Logic Corporation
    Inventor: Kar Lik Wong