Patents by Inventor James S. Blomgren

James S. Blomgren has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9811875
    Abstract: Techniques are disclosed relating to a cache configured to store state information for texture mapping. In one embodiment, a texture state cache includes a plurality of entries configured to store state information relating to one or more stored textures. In this embodiment, the texture state cache also includes texture processing circuitry configured to retrieve state information for one of the stored textures from one of the entries in the texture state cache and determine pixel attributes based on the texture and the retrieved state information. The state information may include texture state information and sampler state information, in some embodiments. The texture state cache may allow for reduced rending times and power consumption, in some embodiments.
    Type: Grant
    Filed: September 10, 2014
    Date of Patent: November 7, 2017
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Adam T. Moerschell, James S. Blomgren
  • Patent number: 9652233
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: May 16, 2017
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Timothy A. Olson, James S. Blomgren, Andrew M. Havlir, Michael Geary
  • Patent number: 9632785
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Grant
    Filed: August 10, 2016
    Date of Patent: April 25, 2017
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9600288
    Abstract: A system and method for efficiently accessing operands in a datapath. An apparatus includes a data operand register file and an execution pipeline with multiple stages. In addition, the apparatus includes a result bypass cache configured to store data results conveyed by at least the final stage of the execution pipeline stage. Control logic is included which is configured to determine whether source operands for an instruction entering the pipeline are available in the last stage of the pipeline or in the result bypass cache. If the source operands are available in the last stage of the pipeline or the result bypass cache, they may be obtained from one of those locations rather than reading from the register file. If the source operands are not available from the last stage or the result bypass cache, then they may be obtained from the data operand register file.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: March 21, 2017
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Timothy A. Olson, James S. Blomgren, Robert A. Drebin, Douglas C. Youngwith, Jon A. Loschke
  • Patent number: 9594395
    Abstract: Techniques are disclosed relating to clock routing techniques in processors with both pipelined and non-pipelined circuitry. In some embodiments, an apparatus includes execution units that are non-pipelined and configured to perform instructions without receiving a clock signal. In these embodiments, one or more clock lines routed throughout the apparatus do not extend into the one or more execution units in each pipeline, reducing the length of the clock lines. In some embodiments, the apparatus includes multiple such pipelines arranged in an array, with the execution units located on an outer portion of the array and clocked control circuitry located on an inner portion of the array. In some embodiments, clock lines do not extend into the outer portion of the array. In some embodiments, the array includes one or more rows of execution units. These arrangements may further reduce the length of clock lines.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: March 14, 2017
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, James S. Blomgren, Terence M. Potter
  • Publication number: 20160350113
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Application
    Filed: August 10, 2016
    Publication date: December 1, 2016
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9508112
    Abstract: Techniques are disclosed relating to a multithreaded execution pipeline. In some embodiments, an apparatus is configured to assign a number of threads to an execution pipeline that is an integer multiple of a minimum number of cycles that an execution unit is configured to use to generate an execution result from a given set of input operands. In one embodiment, the apparatus is configured to require strict ordering of the threads. In one embodiment, the apparatus is configured so that the same thread access (e.g., reads and writes) a register file in a given cycle. In one embodiment, the apparatus is configured so that the same thread does not write back an operand and a result to an operand cache in a given cycle.
    Type: Grant
    Filed: July 31, 2013
    Date of Patent: November 29, 2016
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, James S. Blomgren, Terence M. Potter
  • Patent number: 9459869
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: October 4, 2016
    Assignee: Apple Inc.
    Inventors: Timothy A. Olson, Terence M. Potter, James S. Blomgren, Andrew M. Havlir
  • Patent number: 9442730
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Grant
    Filed: July 31, 2013
    Date of Patent: September 13, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9417843
    Abstract: Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: August 16, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9378146
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from a register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: June 28, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter, Timothy A. Olson, Andrew M. Havlir
  • Patent number: 9292285
    Abstract: Techniques are disclosed relating to floating-point operations in computer processors. In one embodiment, an apparatus includes a floating-point unit and circuitry configured to receive an initial value X for a floating-point operation. In this embodiment, X is between 0 and 1.0 inclusive. In this embodiment, the circuitry is configured to generate first and second floating-point values based on an exponent of X that sum to 1. In this embodiment, the floating-point unit is configured to perform an operation using the first and second floating-point values. The apparatus may reduce drift, in this embodiment, when a floating-point representation of X does not guarantee that the sum of X and (1?X) is 1. The apparatus may be configured to perform blending and/or interpolation operations using the first and second floating-point values.
    Type: Grant
    Filed: August 19, 2013
    Date of Patent: March 22, 2016
    Assignee: Apple Inc.
    Inventor: James S. Blomgren
  • Publication number: 20160071232
    Abstract: Techniques are disclosed relating to a cache configured to store state information for texture mapping. In one embodiment, a texture state cache includes a plurality of entries configured to store state information relating to one or more stored textures. In this embodiment, the texture state cache also includes texture processing circuitry configured to retrieve state information for one of the stored textures from one of the entries in the texture state cache and determine pixel attributes based on the texture and the retrieved state information. The state information may include texture state information and sampler state information, in some embodiments. The texture state cache may allow for reduced rending times and power consumption, in some embodiments.
    Type: Application
    Filed: September 10, 2014
    Publication date: March 10, 2016
    Inventors: Benjiman L. Goodman, Adam T. Moerschell, James S. Blomgren
  • Patent number: 9264066
    Abstract: Techniques are disclosed relating to type conversion using a floating-point unit. In one embodiment, to convert a floating-point value to a normalized integer format, a floating-point unit is configured to perform an operation to generate a result having a significant portion and an exponent portion, where the operation includes multiplying the floating-point value by a constant. In one embodiment, the apparatus is further configured to add a value to the exponent portion of the result, and set a rounding mode to round to nearest. The constant may be a greatest value less than one that can be represented using the particular number of unsigned bits. The value added to the initial exponent may be equal to the number of unsigned bits of the normalized integer format. The apparatus may perform this conversion in response to a pack instruction.
    Type: Grant
    Filed: July 30, 2013
    Date of Patent: February 16, 2016
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Publication number: 20150205324
    Abstract: Techniques are disclosed relating to clock routing techniques in processors with both pipelined and non-pipelined circuitry. In some embodiments, an apparatus includes execution units that are non-pipelined and configured to perform instructions without receiving a clock signal. In these embodiments, one or more clock lines routed throughout the apparatus do not extend into the one or more execution units in each pipeline, reducing the length of the clock lines. In some embodiments, the apparatus includes multiple such pipelines arranged in an array, with the execution units located on an outer portion of the array and clocked control circuitry located on an inner portion of the array. In some embodiments, clock lines do not extend into the outer portion of the array. In some embodiments, the array includes one or more rows of execution units. These arrangements may further reduce the length of clock lines.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 23, 2015
    Applicant: Apple Inc.
    Inventors: Andrew M. Havlir, James S. Blomgren, Terence M. Potter
  • Publication number: 20150058571
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Apple Inc.
    Inventors: Terence M. Potter, Timothy A. Olson, James S. Blomgren, Andrew M. Havlir, Michael Geary
  • Publication number: 20150058573
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter, Timothy A. Olson, Andrew M. Havlir
  • Publication number: 20150058572
    Abstract: Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Apple Inc.
    Inventors: Timothy A. Olson, Terence M. Potter, James S. Blomgren, Andrew M. Havlir
  • Publication number: 20150058389
    Abstract: Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations.
    Type: Application
    Filed: August 20, 2013
    Publication date: February 26, 2015
    Applicant: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Publication number: 20150052335
    Abstract: Techniques are disclosed relating to floating-point operations in computer processors. In one embodiment, an apparatus includes a floating-point unit and circuitry configured to receive an initial value X for a floating-point operation. In this embodiment, X is between 0 and 1.0 inclusive. In this embodiment, the circuitry is configured to generate first and second floating-point values based on an exponent of X that sum to 1. In this embodiment, the floating-point unit is configured to perform an operation using the first and second floating-point values. The apparatus may reduce drift, in this embodiment, when a floating-point representation of X does not guarantee that the sum of X and (1?X) is 1. The apparatus may be configured to perform blending and/or interpolation operations using the first and second floating-point values.
    Type: Application
    Filed: August 19, 2013
    Publication date: February 19, 2015
    Applicant: Apple Inc.
    Inventor: James S. Blomgren