Patents by Inventor Joseph J. Cheng

Joseph J. Cheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10313683
    Abstract: A context switching method for video encoders that enables higher priority video streams to interrupt lower priority video streams. A high priority frame may be received for processing while another frame is being processed. The pipeline may be signaled to perform a context stop for the current frame. The pipeline stops processing the current frame at an appropriate place, and propagates the stop through the stages of the pipeline and to a transcoder through DMA. The stopping location is recorded. The video encoder may then process the higher-priority frame. When done, a context restart is performed and the pipeline resumes processing the lower-priority frame beginning at the recorded location. The transcoder may process data for the interrupted frame while the higher-priority frame is being processed in the pipeline, and similarly the pipeline may begin processing the lower-priority frame after the context restart while the transcoder completes processing the higher-priority frame.
    Type: Grant
    Filed: August 30, 2014
    Date of Patent: June 4, 2019
    Assignee: Apple Inc.
    Inventors: Joseph J. Cheng, Timothy J. Millet, Shun Wai Go, Guy Cote
  • Patent number: 9762919
    Abstract: Methods and apparatus for caching reference data in a block processing pipeline. A cache may be implemented to which reference data corresponding to motion vectors for blocks being processed in the pipeline may be prefetched from memory. Prefetches for the motion vectors may be initiated one or more stages prior to a processing stage. Cache tags for the cache may be defined by the motion vectors. When a motion vector is received, the tags can be checked to determine if there are cache block(s) corresponding to the vector (cache hits) in the cache. Upon a cache miss, a cache block in the cache is selected according to a replacement policy, the respective tag is updated, and a prefetch (e.g., via DMA) for the respective reference data is issued.
    Type: Grant
    Filed: August 28, 2014
    Date of Patent: September 12, 2017
    Assignee: Apple Inc.
    Inventors: Guy Cote, Joseph P. Bratt, Timothy J. Millet, Shing I. Kong, Joseph J. Cheng
  • Patent number: 9641158
    Abstract: Systems, apparatuses, and methods for implementing a low power decimator. A decimator may receive a plurality of input samples from a digital microphone. The decimator may include one or more coefficient tables for storing values combining two or more filter coefficients for filtering the received samples. The decimator may utilize a concatenation of multiple samples to perform a lookup of a corresponding coefficient table. The coefficient tables may store only the necessary non-redundant values for all coefficient combinations which can be applied to the multiple samples. The result of the lookup of the coefficient table may have its sign inverted or be zeroed based on the values of the multiple samples.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: May 2, 2017
    Assignee: Apple Inc.
    Inventors: Tzach Zemer, Joseph J. Cheng
  • Publication number: 20170052782
    Abstract: An apparatus may include a counter circuit and an execution unit. The execution unit may be configured to receive and execute a first instruction. The first instruction may include a first number corresponding to a first number of instructions of a plurality of instructions, a second number corresponding to a number of times to execute a subset of the plurality of instructions, and a third number corresponding to a number of instructions in the subset. The execution unit may be further configured to initialize a first count value in the counter circuit to the second number in response to the execution of the first instruction, to execute the first number of the plurality of instructions, and to execute the subset of the plurality of instructions. The counter circuit may be configured to modify the first count value in response to determining a last instruction of the subset has been retired.
    Type: Application
    Filed: August 21, 2015
    Publication date: February 23, 2017
    Inventors: Binu K. Mathew, Julia C. Erhard, Joseph J. Cheng
  • Publication number: 20170054433
    Abstract: Systems, apparatuses, and methods for implementing a low power decimator. A decimator may receive a plurality of input samples from a digital microphone. The decimator may include one or more coefficient tables for storing values combining two or more filter coefficients for filtering the received samples. The decimator may utilize a concatenation of multiple samples to perform a lookup of a corresponding coefficient table. The coefficient tables may store only the necessary non-redundant values for all coefficient combinations which can be applied to the multiple samples. The result of the lookup of the coefficient table may have its sign inverted or be zeroed based on the values of the multiple samples.
    Type: Application
    Filed: August 20, 2015
    Publication date: February 23, 2017
    Inventors: Tzach Zemer, Joseph J. Cheng
  • Patent number: 9571846
    Abstract: Block processing pipeline methods and apparatus in which reference data are stored to a memory according to tile formats to reduce memory accesses when fetching the data from the memory. When the pipeline stores reference data from a current frame being processed to memory as a reference frame, the reference samples are stored in macroblock sequential order. Each macroblock sample set is stored as a tile. Reference data may be stored in tile formats for luma and chroma. Chroma reference data may be stored in tile formats for chroma 4:2:0, 4:2:2, and/or 4:4:4 formats. A stage of the pipeline may write luma and chroma reference data for macroblocks to memory according to one or more of the macroblock tile formats in a modified knight's order. The stage may delay writing the reference data from the macroblocks until the macroblocks have been fully processed by the pipeline.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: February 14, 2017
    Assignee: Apple Inc.
    Inventors: Timothy John Millet, Mark P. Rygh, Craig M. Okruhlica, Jim C. Chou, Guy Cote, Gaurav S. Gulati, Joseph J. Cheng, Joseph P. Bratt
  • Patent number: 9454378
    Abstract: Methods and apparatus for configuring multiple components of a subsystem are described. The configuration memory of each of a plurality of components coupled to an interconnect includes a global configuration portion. The configuration memory of one of the components may be designated as a master global configuration for all of the components. A module coupled to the interconnect may receive writes to the components from a configuration source. For each write, the module may decode the write to determine addressing information and check to see if the write is addressed to the master global configuration. If the write is addressed to the master global configuration, the module broadcasts the write to the global configuration portion of each of the components via the interconnect. If the write is not addressed to the master global configuration, the module forwards the write to the appropriate component via the interconnect.
    Type: Grant
    Filed: November 18, 2013
    Date of Patent: September 27, 2016
    Assignee: Apple Inc.
    Inventors: Guy Cote, Joseph P. Bratt, Nitin Bhargava, Hao Chen, Joseph J. Cheng
  • Patent number: 9336558
    Abstract: In the video encoders described herein, blocks of pixels from a video frame may be encoded (e.g., using CAVLC encoding) in a block processing pipeline using wavefront ordering (e.g., in knight's order). Each of the encoded blocks may be written to a particular one of multiple DMA buffers such that the encoded blocks written to each of the buffers represent consecutive blocks of the video frame in scan order. A transcode pipeline may operate in parallel with (or at least overlapping) the operation of the block processing pipeline. The transcode pipeline may read encoded blocks from the buffers in scan order and merge them into a single bit stream (in scan order). A transcoder core of the transcode pipeline may decode the encoded blocks and encode them using a different encoding process (e.g., CABAC). In some cases, the transcoder may be bypassed.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: May 10, 2016
    Assignee: Apple Inc.
    Inventors: Guy Cote, Timothy John Millet, Joseph J. Cheng, Mark P. Rygh, Jim C. Chou
  • Patent number: 9305325
    Abstract: Methods and apparatus for caching neighbor data in a block processing pipeline that processes blocks in knight's order with quadrow constraints. Stages of the pipeline may maintain two local buffers that contain data from neighbor blocks of a current block. A first buffer contains data from the last C blocks processed at the stage. A second buffer contains data from neighbor blocks on the last row of a previous quadrow. Data for blocks on the bottom row of a quadrow are stored to an external memory at the end of the pipeline. When a block on the top row of a quadrow is input to the pipeline, neighbor data from the bottom row of the previous quadrow is read from the external memory and passed down the pipeline, each stage storing the data in its second buffer and using the neighbor data in the second buffer when processing the block.
    Type: Grant
    Filed: September 25, 2013
    Date of Patent: April 5, 2016
    Assignee: Apple Inc.
    Inventors: Joseph J. Cheng, Guy Cote, Marc A. Schaub, Jim C. Chou
  • Patent number: 9292899
    Abstract: Block processing pipeline methods and apparatus in which pixel data from a reference frame is prefetched into a search window memory. The search window may include two or more overlapping regions of pixels from the reference frame corresponding to blocks from the rows in the input frame that are currently being processed in the pipeline. Thus, the pipeline may process blocks from multiple rows of an input frame using one set of pixel data from a reference frame that is stored in a shared search window memory. The search window may be advanced by one column of blocks by initiating a prefetch for a next column of reference data from a memory. The pipeline may also include a reference data cache that may be used to cache a portion of a reference frame and from which at least a portion of a prefetch for the search window may be satisfied.
    Type: Grant
    Filed: September 25, 2013
    Date of Patent: March 22, 2016
    Assignee: Apple Inc.
    Inventors: Marc A. Schaub, Joseph J. Cheng, Mark P. Rygh, Guy Cote
  • Publication number: 20160065969
    Abstract: A context switching method for video encoders that enables higher priority video streams to interrupt lower priority video streams. A high priority frame may be received for processing while another frame is being processed. The pipeline may be signaled to perform a context stop for the current frame. The pipeline stops processing the current frame at an appropriate place, and propagates the stop through the stages of the pipeline and to a transcoder through DMA. The stopping location is recorded. The video encoder may then process the higher-priority frame. When done, a context restart is performed and the pipeline resumes processing the lower-priority frame beginning at the recorded location. The transcoder may process data for the interrupted frame while the higher-priority frame is being processed in the pipeline, and similarly the pipeline may begin processing the lower-priority frame after the context restart while the transcoder completes processing the higher-priority frame.
    Type: Application
    Filed: August 30, 2014
    Publication date: March 3, 2016
    Applicant: APPLE INC.
    Inventors: Joseph J. Cheng, Timothy J. Millet, Shun Wai Go, Guy Cote
  • Publication number: 20160065973
    Abstract: Methods and apparatus for caching reference data in a block processing pipeline. A cache may be implemented to which reference data corresponding to motion vectors for blocks being processed in the pipeline may be prefetched from memory. Prefetches for the motion vectors may be initiated one or more stages prior to a processing stage. Cache tags for the cache may be defined by the motion vectors. When a motion vector is received, the tags can be checked to determine if there are cache block(s) corresponding to the vector (cache hits) in the cache. Upon a cache miss, a cache block in the cache is selected according to a replacement policy, the respective tag is updated, and a prefetch (e.g., via DMA) for the respective reference data is issued.
    Type: Application
    Filed: August 28, 2014
    Publication date: March 3, 2016
    Applicant: APPLE INC.
    Inventors: Guy Cote, Joseph P. Bratt, Timothy J. Millet, Shing I. Kong, Joseph J. Cheng
  • Patent number: 9224186
    Abstract: Memory latency tolerance methods and apparatus for maintaining an overall level of performance in block processing pipelines that prefetch reference data into a search window. In a general memory latency tolerance method, search window processing in the pipeline may be monitored. If status of search window processing changes in a way that affects pipeline throughput, then pipeline processing may be modified. The modification may be performed according to no stall methods, stall recovery methods, and/or stall prevention methods. In no stall methods, a block may be processed using the data present in the search window without waiting for the missing reference data. In stall recovery methods, the pipeline is allowed to stall, and processing is modified for subsequent blocks to speed up the pipeline and catch up in throughput. In stall prevention methods, processing is adjusted in advance of the pipeline encountering a stall condition.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: December 29, 2015
    Assignee: Apple Inc.
    Inventors: Mark P. Rygh, Guy Cote, Timothy John Millet, Joseph J. Cheng
  • Patent number: 9218639
    Abstract: A knight's order processing method for block processing pipelines in which the next block input to the pipeline is taken from the row below and one or more columns to the left in the frame. The knight's order method may provide spacing between adjacent blocks in the pipeline to facilitate feedback of data from a downstream stage to an upstream stage. The rows of blocks in the input frame may be divided into sets of rows that constrain the knight's order method to maintain locality of neighbor block data. Invalid blocks may be input to the pipeline at the left of the first set of rows and at the right of the last set of rows, and the sets of rows may be treated as if they are horizontally arranged rather than vertically arranged, to maintain continuity of the knight's order algorithm.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: December 22, 2015
    Assignee: Apple Inc.
    Inventors: Guy Cote, Mark P. Rygh, Timothy John Millet, Jim C. Chou, Joseph J. Cheng
  • Patent number: 9215472
    Abstract: A block processing pipeline that includes a software pipeline and a hardware pipeline that run in parallel. The software pipeline runs at least one block ahead of the hardware pipeline. The stages of the pipeline may each include a hardware pipeline component that performs one or more operations on a current block at the stage. At least one stage of the pipeline may also include a software pipeline component that determines a configuration for the hardware component at the stage of the pipeline for processing a next block while the hardware component is processing the current block. The software pipeline component may determine the configuration according to information related to the next block obtained from an upstream stage of the pipeline. The software pipeline component may also obtain and use information related to a block that was previously processed at the stage.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: December 15, 2015
    Assignee: Apple Inc.
    Inventors: James E. Orr, Timothy John Millet, Joseph J. Cheng, Nitin Bhargava, Guy Cote
  • Publication number: 20150091921
    Abstract: In the video encoders described herein, blocks of pixels from a video frame may be encoded (e.g., using CAVLC encoding) in a block processing pipeline using wavefront ordering (e.g., in knight's order). Each of the encoded blocks may be written to a particular one of multiple DMA buffers such that the encoded blocks written to each of the buffers represent consecutive blocks of the video frame in scan order. A transcode pipeline may operate in parallel with (or at least overlapping) the operation of the block processing pipeline. The transcode pipeline may read encoded blocks from the buffers in scan order and merge them into a single bit stream (in scan order). A transcoder core of the transcode pipeline may decode the encoded blocks and encode them using a different encoding process (e.g., CABAC). In some cases, the transcoder may be bypassed.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Apple Inc.
    Inventors: Guy Cote, Timothy John Millet, Joseph J. Cheng, Mark P. Rygh, Jim C. Chou
  • Publication number: 20150091920
    Abstract: Memory latency tolerance methods and apparatus for maintaining an overall level of performance in block processing pipelines that prefetch reference data into a search window. In a general memory latency tolerance method, search window processing in the pipeline may be monitored. If status of search window processing changes in a way that affects pipeline throughput, then pipeline processing may be modified. The modification may be performed according to no stall methods, stall recovery methods, and/or stall prevention methods. In no stall methods, a block may be processed using the data present in the search window without waiting for the missing reference data. In stall recovery methods, the pipeline is allowed to stall, and processing is modified for subsequent blocks to speed up the pipeline and catch up in throughput. In stall prevention methods, processing is adjusted in advance of the pipeline encountering a stall condition.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Apple Inc.
    Inventors: Mark P. Rygh, Guy Cote, Timothy John Millet, Joseph J. Cheng
  • Publication number: 20150092854
    Abstract: A block processing pipeline that includes a software pipeline and a hardware pipeline that run in parallel. The software pipeline runs at least one block ahead of the hardware pipeline. The stages of the pipeline may each include a hardware pipeline component that performs one or more operations on a current block at the stage. At least one stage of the pipeline may also include a software pipeline component that determines a configuration for the hardware component at the stage of the pipeline for processing a next block while the hardware component is processing the current block. The software pipeline component may determine the configuration according to information related to the next block obtained from an upstream stage of the pipeline. The software pipeline component may also obtain and use information related to a block that was previously processed at the stage.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Apple Inc.
    Inventors: James E. Orr, Timothy John Millet, Joseph J. Cheng, Nitin Bhargava, Guy Cote
  • Publication number: 20150092843
    Abstract: Block processing pipeline methods and apparatus in which reference data are stored to a memory according to tile formats to reduce memory accesses when fetching the data from the memory. When the pipeline stores reference data from a current frame being processed to memory as a reference frame, the reference samples are stored in macroblock sequential order. Each macroblock sample set is stored as a tile. Reference data may be stored in tile formats for luma and chroma. Chroma reference data may be stored in tile formats for chroma 4:2:0, 4:2:2, and/or 4:4:4 formats. A stage of the pipeline may write luma and chroma reference data for macroblocks to memory according to one or more of the macroblock tile formats in a modified knight's order. The stage may delay writing the reference data from the macroblocks until the macroblocks have been fully processed by the pipeline.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Apple Inc.
    Inventors: Timothy John Millet, Mark P. Rygh, Craig M. Okruhlica, Jim C. Chou, Guy Cote, Gaurav S. Gulati, Joseph J. Cheng, Joseph P. Bratt
  • Publication number: 20150095630
    Abstract: Methods and apparatus for configuring multiple components of a subsystem are described. The configuration memory of each of a plurality of components coupled to an interconnect includes a global configuration portion. The configuration memory of one of the components may be designated as a master global configuration for all of the components. A module coupled to the interconnect may receive writes to the components from a configuration source. For each write, the module may decode the write to determine addressing information and check to see if the write is addressed to the master global configuration. If the write is addressed to the master global configuration, the module broadcasts the write to the global configuration portion of each of the components via the interconnect. If the write is not addressed to the master global configuration, the module forwards the write to the appropriate component via the interconnect.
    Type: Application
    Filed: November 18, 2013
    Publication date: April 2, 2015
    Applicant: Apple Inc.
    Inventors: Guy Cote, Joseph P. Bratt, Nitin Bhargava, Hao Chen, Joseph J. Cheng