Patents by Inventor Scott A. Mahlke

Scott A. Mahlke has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160103691
    Abstract: This follows a data processing system comprising multiple GPUs 2, 4, 6, 8 includes instruction queue circuitry 28 storing data specifying program instructions for threads awaiting issue for execution. Instruction characterisation circuitry 30 determines one or more characteristics of the program instructions awaiting issue within the instructional queue circuitry 28 and supplies this to operating parameter control circuitry 20. The operating parameter control circuitry 20 alters one or more operating parameters of the system in response to the one or more characteristics of the program instructions awaiting issue.
    Type: Application
    Filed: October 9, 2014
    Publication date: April 14, 2016
    Inventors: Ankit SETHIA, Scott MAHLKE
  • Publication number: 20160004534
    Abstract: A data processing apparatus 2 includes a first execution mechanism 4, such as an out-of-order processing circuitry, and a second execution mechanism 6 such as an in-order processing circuitry. Switching control circuitry 24 controls switching between which of the first execution circuitry 4 and the second execution circuitry 6 is active at a given time. Latency indicating signals indicative of the latency associated with a candidate switching operation to be performed are supplied to the switching control circuitry 24 and used to control the switching operation. The control of the switching operation may be to accelerate the switching operation, prevent the switching operation, perform early architectural state data transfer or other possibilities.
    Type: Application
    Filed: July 3, 2014
    Publication date: January 7, 2016
    Inventors: Shruti PADMANABHA, Andrew LUKEFAHR, Reetuparna DAS, Scott MAHLKE
  • Publication number: 20150154021
    Abstract: An apparatus 2 for processing data includes first execution circuitry 4, such as an out-of-order processor, and second execution circuitry 6, such as an in-order processor. The first execution circuitry 4 is of higher performance but uses more energy than the second execution circuitry 6. Control circuitry 24 switches between the first execution circuitry 4 being active and the second execution circuitry 6 being active. The control circuitry includes prediction circuitry which is configured to predict a predicted identity of a next sequence of program instructions to be executed in dependence upon a most recently executed sequence of program instructions and then in dependence upon this predicted identity to predict a predicted execution target corresponding to whether the next sequence of program instructions should be executed by the first execution circuitry or the second execution circuitry.
    Type: Application
    Filed: November 29, 2013
    Publication date: June 4, 2015
    Applicant: THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
  • Publication number: 20150121048
    Abstract: A processor core includes a front end, and first and second back ends, the front end including a fetch engine configured to retrieve the sequence of data processing instructions for both the first back end and the second back end from a memory, and the first and second back ends are each configured to execute the sequence of program instructions. The core operates in a first mode in which the first back end is active and receives the sequence of data processing instructions from the fetch engine and the second back end is inactive, and a second mode in which the first back end is inactive and the second back end is active and receives the sequence of data processing instructions from the fetch engine, where the cycles-per-instruction rate is lower and energy consumption is higher for the first mode than the second mode.
    Type: Application
    Filed: November 29, 2013
    Publication date: April 30, 2015
    Applicant: THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Andrew LUKEFAHR, Reetuparna DAS, Shruti PADMANABHA, Scott MAHLKE
  • Patent number: 8813073
    Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to generate one or more task groups, and to allocate the task groups to virtual cores by dividing or combining the tasks included in the task groups based on the execution time estimates of the task groups. The apparatus also includes a dynamic scheduling unit configured to map the virtual cores to physical cores.
    Type: Grant
    Filed: May 26, 2011
    Date of Patent: August 19, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ki-Seok Kwon, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
  • Patent number: 8505002
    Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.
    Type: Grant
    Filed: September 27, 2007
    Date of Patent: August 6, 2013
    Assignees: ARM Limited, The Regents of the University of Michigan
    Inventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
  • Patent number: 8219885
    Abstract: A data processing system includes a register file having a plurality of registers storing respective register data values and an associated register value cache having a plurality of storage locations storing corresponding cache data values. There are fewer cache data values than registers. When a register is to be read, both the register data value and, if present, a cache data value from a corresponding storage location within the register value cache are read and compared by a comparator. This generates a match signal which indicates if the data values do not match that one of the data values is in error. The match signal stalls the processing and a CRC code initially stored with the cache data value and recalculated based upon the read cache data value are compared to determine whether or not the cache data value has changed since it was stored. If the cache data value has not changed, then it is correct and is output instead of the register data value.
    Type: Grant
    Filed: August 15, 2006
    Date of Patent: July 10, 2012
    Assignees: ARM Limited, The Regents of the University of Michigan
    Inventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
  • Publication number: 20120166762
    Abstract: Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a plurality of execution modes, and a controller for detecting a loop region from a program, determining a Single Instruction Multiple Data (SIMD) width for the detected loop region, and determining an execution mode of the processor according to the determined SIMD width.
    Type: Application
    Filed: July 8, 2011
    Publication date: June 28, 2012
    Inventors: Jae Un Park, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
  • Publication number: 20120159507
    Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to generate one or more task groups, and to allocate the task groups to virtual cores by dividing or combining the tasks included in the task groups based on the execution time estimates of the task groups. The apparatus also includes a dynamic scheduling unit configured to map the virtual cores to physical cores.
    Type: Application
    Filed: May 26, 2011
    Publication date: June 21, 2012
    Inventors: Ki-Seok Kwon, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
  • Patent number: 7685404
    Abstract: An apparatus is provided for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within the program. A memory stores a program formed of separate program instructions. Processing logic executes respective separate program instructions from said program. Accelerator logic, in response to reaching an execution point within the program associated with a subgraph suggestion, executes a sequence of program instructions corresponding to the subgraph suggestion as an accelerated operation instead of executing the sequence of program instructions as respective separate program instructions with the processing logic.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: March 23, 2010
    Assignees: ARM Limited, University of Michigan
    Inventors: Stuart David Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Publication number: 20090292977
    Abstract: A data processing system includes a register file (2) having a plurality of registers storing respective register data values and an associated register value cache (12) having a plurality of storage locations (14) storing corresponding cache data values. There are fewer cache data values than registers. When a register is to be read, both the register data value and, if present, a cache data value from a corresponding storage location (14) within the register value cache (12) are read and compared by a comparator (18). This generates a match signal which indicates if the data values do not match that one of the data values is in error. The match signal stalls the processing and a CRC code initially stored with the cache data value and recalculated based upon the read cache data value are compared to determine whether or not the cache data value has changed since it was stored. If the cache data value has not changed, then it is correct and is output instead of the register data value.
    Type: Application
    Filed: August 15, 2006
    Publication date: November 26, 2009
    Inventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
  • Publication number: 20090119490
    Abstract: An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions, and allocating the selected first instruction and a first time slot to one of the functional units, allocating a second instruction and a second time slot to one of the functional units, wherein the second instruction is dependent on the first instruction.
    Type: Application
    Filed: March 20, 2008
    Publication date: May 7, 2009
    Inventors: Taewook Oh, Hong-Seok Kim, Scott Mahlke, Hyun Chul Park
  • Publication number: 20080141012
    Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.
    Type: Application
    Filed: September 27, 2007
    Publication date: June 12, 2008
    Applicants: ARM LIMITED, The Regents of the University of Michigan
    Inventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
  • Patent number: 7350055
    Abstract: An accelerator 120 is tightly coupled to the normal execution unit 110. The operand store, which could be a register file 130, a stack based operand store or other operand store is shared by the execution unit and the accelerator unit. Operands may also be accessed as immediate values within the instructions themselves. The sequences of individual program instructions corresponding to computational subgraphs remain within a program but can be recognized by the accelerator as suitable for acceleration and when encountered are executed by the accelerator instead of by the normal execution unit. Within such tightly coupled arrangement problems can arise due to a lack of register resources within the system. The present technique provides that at least some intermediate operand values which are generated within the accelerator, but are determined not to be referenced outside of the computational subgraph concerned, are not written to the operand store.
    Type: Grant
    Filed: January 31, 2005
    Date of Patent: March 25, 2008
    Assignee: Arm Limited
    Inventors: Stuart D. Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Patent number: 7343482
    Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
    Type: Grant
    Filed: January 31, 2005
    Date of Patent: March 11, 2008
    Assignees: ARM Limited, University of Michigan
    Inventors: Stuart David Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Publication number: 20080036487
    Abstract: An integrated circuit is provided with latency detecting circuitry for detecting signal generation latency within one or more functional circuits and in response thereto to generate a wearout response. The wearout response can take a variety of different forms such as reducing the operating frequency, increasing the operating voltage, operating task allocation within a multiprocessor system, manufacturing test binning and other wearout responses.
    Type: Application
    Filed: July 27, 2007
    Publication date: February 14, 2008
    Applicants: ARM LIMITED, UNIVERSITY OF MICHIGAN
    Inventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
  • Patent number: 7318143
    Abstract: An information processor for executing a program comprising a plurality of separate program instructions is provided. The processor comprises processing logic operable to individually execute said separate program instructions of said program, an operand store operable to store operand values and an accelerator having a plurality of functional units. The accelerator executes a combined operation corresponding to a computational sub-graph of the separate program instructions by configuring individual ones of said plurality of functional units to perform particular processing operations associated with the combined operation. The accelerator executes the combined operation in dependence upon operand mapping data providing a mapping between operands of the combined operation and storage locations within said operand store and in dependence upon separately specified configuration data providing a mapping between the plurality of functional units and the particular processing operations.
    Type: Grant
    Filed: January 28, 2005
    Date of Patent: January 8, 2008
    Assignees: ARM Limited, University of Michigan
    Inventors: Stuart D. Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Publication number: 20070239969
    Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
    Type: Application
    Filed: June 5, 2007
    Publication date: October 11, 2007
    Applicants: ARM Limited, University of Michigan
    Inventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Publication number: 20060095720
    Abstract: There is provided an information processor for executing a program comprising a plurality of separate program instructions: processing logic operable to individually execute said separate program instructions of said program; an operand store operable to store operand values; and an accelerator having an array comprising a plurality of functional units, said accelerator being operable to execute a combined operation corresponding to a computational subgraph of said separate program instructions by configuring individual ones of said plurality of functional units to perform particular processing operations associated with one or more processing stages of said combined operation; wherein said accelerator executes said combined operation in dependence upon operand mapping data providing a mapping between operands of said combined operation and storage locations within said operand store and in dependence upon separately specified configuration data providing a mapping between said plurality of functional units a
    Type: Application
    Filed: January 28, 2005
    Publication date: May 4, 2006
    Applicant: ARM LIMITED
    Inventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
  • Publication number: 20060095722
    Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
    Type: Application
    Filed: January 31, 2005
    Publication date: May 4, 2006
    Applicants: ARM LIMITED, University of Michigan
    Inventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark