Patents by Inventor Scott A. Mahlke
Scott A. Mahlke has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160103691Abstract: This follows a data processing system comprising multiple GPUs 2, 4, 6, 8 includes instruction queue circuitry 28 storing data specifying program instructions for threads awaiting issue for execution. Instruction characterisation circuitry 30 determines one or more characteristics of the program instructions awaiting issue within the instructional queue circuitry 28 and supplies this to operating parameter control circuitry 20. The operating parameter control circuitry 20 alters one or more operating parameters of the system in response to the one or more characteristics of the program instructions awaiting issue.Type: ApplicationFiled: October 9, 2014Publication date: April 14, 2016Inventors: Ankit SETHIA, Scott MAHLKE
-
Publication number: 20160004534Abstract: A data processing apparatus 2 includes a first execution mechanism 4, such as an out-of-order processing circuitry, and a second execution mechanism 6 such as an in-order processing circuitry. Switching control circuitry 24 controls switching between which of the first execution circuitry 4 and the second execution circuitry 6 is active at a given time. Latency indicating signals indicative of the latency associated with a candidate switching operation to be performed are supplied to the switching control circuitry 24 and used to control the switching operation. The control of the switching operation may be to accelerate the switching operation, prevent the switching operation, perform early architectural state data transfer or other possibilities.Type: ApplicationFiled: July 3, 2014Publication date: January 7, 2016Inventors: Shruti PADMANABHA, Andrew LUKEFAHR, Reetuparna DAS, Scott MAHLKE
-
Publication number: 20150154021Abstract: An apparatus 2 for processing data includes first execution circuitry 4, such as an out-of-order processor, and second execution circuitry 6, such as an in-order processor. The first execution circuitry 4 is of higher performance but uses more energy than the second execution circuitry 6. Control circuitry 24 switches between the first execution circuitry 4 being active and the second execution circuitry 6 being active. The control circuitry includes prediction circuitry which is configured to predict a predicted identity of a next sequence of program instructions to be executed in dependence upon a most recently executed sequence of program instructions and then in dependence upon this predicted identity to predict a predicted execution target corresponding to whether the next sequence of program instructions should be executed by the first execution circuitry or the second execution circuitry.Type: ApplicationFiled: November 29, 2013Publication date: June 4, 2015Applicant: THE REGENTS OF THE UNIVERSITY OF MICHIGANInventors: Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
-
Publication number: 20150121048Abstract: A processor core includes a front end, and first and second back ends, the front end including a fetch engine configured to retrieve the sequence of data processing instructions for both the first back end and the second back end from a memory, and the first and second back ends are each configured to execute the sequence of program instructions. The core operates in a first mode in which the first back end is active and receives the sequence of data processing instructions from the fetch engine and the second back end is inactive, and a second mode in which the first back end is inactive and the second back end is active and receives the sequence of data processing instructions from the fetch engine, where the cycles-per-instruction rate is lower and energy consumption is higher for the first mode than the second mode.Type: ApplicationFiled: November 29, 2013Publication date: April 30, 2015Applicant: THE REGENTS OF THE UNIVERSITY OF MICHIGANInventors: Andrew LUKEFAHR, Reetuparna DAS, Shruti PADMANABHA, Scott MAHLKE
-
Patent number: 8813073Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to generate one or more task groups, and to allocate the task groups to virtual cores by dividing or combining the tasks included in the task groups based on the execution time estimates of the task groups. The apparatus also includes a dynamic scheduling unit configured to map the virtual cores to physical cores.Type: GrantFiled: May 26, 2011Date of Patent: August 19, 2014Assignee: Samsung Electronics Co., Ltd.Inventors: Ki-Seok Kwon, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
-
Patent number: 8505002Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.Type: GrantFiled: September 27, 2007Date of Patent: August 6, 2013Assignees: ARM Limited, The Regents of the University of MichiganInventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
-
Patent number: 8219885Abstract: A data processing system includes a register file having a plurality of registers storing respective register data values and an associated register value cache having a plurality of storage locations storing corresponding cache data values. There are fewer cache data values than registers. When a register is to be read, both the register data value and, if present, a cache data value from a corresponding storage location within the register value cache are read and compared by a comparator. This generates a match signal which indicates if the data values do not match that one of the data values is in error. The match signal stalls the processing and a CRC code initially stored with the cache data value and recalculated based upon the read cache data value are compared to determine whether or not the cache data value has changed since it was stored. If the cache data value has not changed, then it is correct and is output instead of the register data value.Type: GrantFiled: August 15, 2006Date of Patent: July 10, 2012Assignees: ARM Limited, The Regents of the University of MichiganInventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
-
Publication number: 20120166762Abstract: Provided are a computing apparatus and method based on SIMD architecture capable of supporting various SIMD widths without wasting resources. The computing apparatus includes a plurality of configurable execution cores (CECs) that have a plurality of execution modes, and a controller for detecting a loop region from a program, determining a Single Instruction Multiple Data (SIMD) width for the detected loop region, and determining an execution mode of the processor according to the determined SIMD width.Type: ApplicationFiled: July 8, 2011Publication date: June 28, 2012Inventors: Jae Un Park, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
-
Publication number: 20120159507Abstract: An apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore device are provided. The apparatus includes a static scheduling unit configured to generate one or more task groups, and to allocate the task groups to virtual cores by dividing or combining the tasks included in the task groups based on the execution time estimates of the task groups. The apparatus also includes a dynamic scheduling unit configured to map the virtual cores to physical cores.Type: ApplicationFiled: May 26, 2011Publication date: June 21, 2012Inventors: Ki-Seok Kwon, Suk-Jin Kim, Scott Mahlke, Yong-Jun Park
-
Patent number: 7685404Abstract: An apparatus is provided for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within the program. A memory stores a program formed of separate program instructions. Processing logic executes respective separate program instructions from said program. Accelerator logic, in response to reaching an execution point within the program associated with a subgraph suggestion, executes a sequence of program instructions corresponding to the subgraph suggestion as an accelerated operation instead of executing the sequence of program instructions as respective separate program instructions with the processing logic.Type: GrantFiled: June 5, 2007Date of Patent: March 23, 2010Assignees: ARM Limited, University of MichiganInventors: Stuart David Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Publication number: 20090292977Abstract: A data processing system includes a register file (2) having a plurality of registers storing respective register data values and an associated register value cache (12) having a plurality of storage locations (14) storing corresponding cache data values. There are fewer cache data values than registers. When a register is to be read, both the register data value and, if present, a cache data value from a corresponding storage location (14) within the register value cache (12) are read and compared by a comparator (18). This generates a match signal which indicates if the data values do not match that one of the data values is in error. The match signal stalls the processing and a CRC code initially stored with the cache data value and recalculated based upon the read cache data value are compared to determine whether or not the cache data value has changed since it was stored. If the cache data value has not changed, then it is correct and is output instead of the register data value.Type: ApplicationFiled: August 15, 2006Publication date: November 26, 2009Inventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
-
Publication number: 20090119490Abstract: An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions, and allocating the selected first instruction and a first time slot to one of the functional units, allocating a second instruction and a second time slot to one of the functional units, wherein the second instruction is dependent on the first instruction.Type: ApplicationFiled: March 20, 2008Publication date: May 7, 2009Inventors: Taewook Oh, Hong-Seok Kim, Scott Mahlke, Hyun Chul Park
-
Publication number: 20080141012Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.Type: ApplicationFiled: September 27, 2007Publication date: June 12, 2008Applicants: ARM LIMITED, The Regents of the University of MichiganInventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
-
Patent number: 7350055Abstract: An accelerator 120 is tightly coupled to the normal execution unit 110. The operand store, which could be a register file 130, a stack based operand store or other operand store is shared by the execution unit and the accelerator unit. Operands may also be accessed as immediate values within the instructions themselves. The sequences of individual program instructions corresponding to computational subgraphs remain within a program but can be recognized by the accelerator as suitable for acceleration and when encountered are executed by the accelerator instead of by the normal execution unit. Within such tightly coupled arrangement problems can arise due to a lack of register resources within the system. The present technique provides that at least some intermediate operand values which are generated within the accelerator, but are determined not to be referenced outside of the computational subgraph concerned, are not written to the operand store.Type: GrantFiled: January 31, 2005Date of Patent: March 25, 2008Assignee: Arm LimitedInventors: Stuart D. Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Patent number: 7343482Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.Type: GrantFiled: January 31, 2005Date of Patent: March 11, 2008Assignees: ARM Limited, University of MichiganInventors: Stuart David Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Publication number: 20080036487Abstract: An integrated circuit is provided with latency detecting circuitry for detecting signal generation latency within one or more functional circuits and in response thereto to generate a wearout response. The wearout response can take a variety of different forms such as reducing the operating frequency, increasing the operating voltage, operating task allocation within a multiprocessor system, manufacturing test binning and other wearout responses.Type: ApplicationFiled: July 27, 2007Publication date: February 14, 2008Applicants: ARM LIMITED, UNIVERSITY OF MICHIGANInventors: Daryl Wayne Bradley, Jason Andrew Blome, Scott Mahlke
-
Patent number: 7318143Abstract: An information processor for executing a program comprising a plurality of separate program instructions is provided. The processor comprises processing logic operable to individually execute said separate program instructions of said program, an operand store operable to store operand values and an accelerator having a plurality of functional units. The accelerator executes a combined operation corresponding to a computational sub-graph of the separate program instructions by configuring individual ones of said plurality of functional units to perform particular processing operations associated with the combined operation. The accelerator executes the combined operation in dependence upon operand mapping data providing a mapping between operands of the combined operation and storage locations within said operand store and in dependence upon separately specified configuration data providing a mapping between the plurality of functional units and the particular processing operations.Type: GrantFiled: January 28, 2005Date of Patent: January 8, 2008Assignees: ARM Limited, University of MichiganInventors: Stuart D. Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Publication number: 20070239969Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.Type: ApplicationFiled: June 5, 2007Publication date: October 11, 2007Applicants: ARM Limited, University of MichiganInventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Publication number: 20060095720Abstract: There is provided an information processor for executing a program comprising a plurality of separate program instructions: processing logic operable to individually execute said separate program instructions of said program; an operand store operable to store operand values; and an accelerator having an array comprising a plurality of functional units, said accelerator being operable to execute a combined operation corresponding to a computational subgraph of said separate program instructions by configuring individual ones of said plurality of functional units to perform particular processing operations associated with one or more processing stages of said combined operation; wherein said accelerator executes said combined operation in dependence upon operand mapping data providing a mapping between operands of said combined operation and storage locations within said operand store and in dependence upon separately specified configuration data providing a mapping between said plurality of functional units aType: ApplicationFiled: January 28, 2005Publication date: May 4, 2006Applicant: ARM LIMITEDInventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark
-
Publication number: 20060095722Abstract: There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.Type: ApplicationFiled: January 31, 2005Publication date: May 4, 2006Applicants: ARM LIMITED, University of MichiganInventors: Stuart Biles, Krisztian Flautner, Scott Mahlke, Nathan Clark