Abstract: A circuit arrangement and method for a data processing system for executing a plurality of tasks with a central processing unit having a processing capacity allocated to the processing unit; the circuit arrangement being configured to allocate the processing unit to the specific tasks in a time-staggered manner for processing, so that the tasks are processed in an order to be selected and tasks not having a current processing request are skipped over in the order during the processing; the circuit arrangement including a prioritization order control unit to determine the order in which the tasks are executed; and in response to each selection of a task for processing, the order of the tasks being redetermined and the selection being controlled so that for a number N of tasks, a maximum of N time units elapse until an active task is once more allocated processing capacity by the processing unit.
Abstract: Embodiments of the invention broadly contemplate systems, methods, apparatuses and program products providing a power management technique for an HPC cluster with performance improvements for parallel applications. According to various embodiments of the invention, power usage of an HPC cluster is reduced by boosting the performance of one or more select nodes within the cluster so that the one or more nodes take less time to complete. Embodiments of the invention accomplish this by selectively identifying the appropriate node(s) (or core(s) within the appropriate node(s)) in the cluster and increasing the computing capacity of the selected node(s) (or core(s) within the appropriate node(s)).
Type:
Grant
Filed:
November 30, 2009
Date of Patent:
March 3, 2015
Assignee:
Intenational Business Machines Corporation
Inventors:
Pradipta K. Banerjee, Anbazhagan Mani, Rajan Ravindran, Vaidyanathan Srinivasan
Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.
Type:
Grant
Filed:
December 15, 2011
Date of Patent:
March 3, 2015
Assignee:
International Business Machines Corporation
Inventors:
Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
Abstract: Example methods and apparatus to manage object locks are disclosed. A disclosed example method includes receiving an object lock request from a processor, the lock request associated with object lock code to lock an object, and generating object lock-bypass code based on a type of the processor, the object lock-bypass code to execute in a managed runtime in response to receiving the object lock request. The example method also includes identifying a type of instruction set architecture (ISA) associated with the processor, invoking a checkpoint instruction for the processor based on the identified ISA, suspending the object lock code from executing and executing target code when the object is uncontended, and allowing the object lock code to execute when the object is contended.
Type:
Grant
Filed:
December 23, 2009
Date of Patent:
March 3, 2015
Assignee:
Intel Corporation
Inventors:
Suresh Srinivas, Stephen H. Dohrmann, Mingqiu Sun, Uma Srinivasan, Ravi Rajwar, Konrad K. Lai
Abstract: A system and method for reducing pipeline latency. In one embodiment, a processing system includes a processing pipeline. The processing pipeline includes a plurality of processing stages. Each stage is configured to further processing provided by a previous stage. A first of the stages is configured to perform a first function in a pipeline cycle. A second of the stages is disposed downstream of the first of the stages, and is configured to perform, in a pipeline cycle, a second function that is different from the first function. The first of the stages is further configured to selectably perform the first function and the second function in a pipeline cycle, and bypass the second of the stages.
Type:
Application
Filed:
August 23, 2013
Publication date:
February 26, 2015
Applicant:
TEXAS INSTRUMENTS DEUTSCHLAND GMBH
Inventors:
Christian Wiencke, Shrey Sudhir Bhatia, Jeroen Vilegen
Abstract: A method and system of verifying proper execution of a secure mode entry sequence. At least some of the exemplary embodiments may be a method comprising delivering an instruction from a memory to a processor across an instruction bus (the instruction at least partially configures the processor for secure mode of operation different that privilege modes of the processor), verifying delivery of the instruction across the instruction bus, and checking for proper execution of the instruction using a trace port of the processor.
Type:
Grant
Filed:
October 8, 2004
Date of Patent:
February 24, 2015
Assignee:
Texas Instruments Incorporated
Inventors:
Gregory Remy Philippe Conti, Jerome Laurent Azema
Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.
Type:
Grant
Filed:
September 29, 2011
Date of Patent:
February 24, 2015
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Benedict R. Gaster, Lee W. Howes, Mark D. Hummel
Abstract: A system comprising a processor adapted to activate multiple security levels for the system and a monitoring device coupled to the processor and employing security rules pertaining to the multiple security levels. The monitoring device restricts usage of the system if the processor activates the security levels in a sequence contrary to the security rules.
Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.
Type:
Grant
Filed:
October 15, 2009
Date of Patent:
February 17, 2015
Assignee:
QUALCOMM Incorporated
Inventors:
Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
Abstract: A scheduler in a process of a computer system schedules tasks of a task group for concurrent execution by multiple execution contexts. The scheduler provides a mechanism that allows the task group to be cancelled by an arbitrary execution context or an asynchronous error state. When a task group is cancelled, the scheduler sets a cancel indicator in each execution context that is executing tasks corresponding to the cancelled task group and performs a cancellation process on each of the execution contexts where a cancel indicator is set. The scheduler also creates local aliases to allow task groups to be used without synchronization by execution contexts that are not directly bound to the task groups.
Type:
Grant
Filed:
June 10, 2009
Date of Patent:
February 17, 2015
Assignee:
Microsoft Corporation
Inventors:
William R. Messmer, David Callahan, Paul F. Ringseth, Niklas Gustafsson
Abstract: A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates.
Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
Type:
Application
Filed:
August 7, 2013
Publication date:
February 12, 2015
Applicant:
NVIDIA CORPORATION
Inventors:
Mojtaba Mehrara, Michael Garland, Gregory Diamos
Abstract: The various aspects provide for a device and methods for intelligent multicore control of a plurality of processor cores of a multicore integrated circuit. The aspects may identify and activate an optimal set of processor cores to achieve the lowest level power consumption for a given workload or the highest performance for a given power budget. The optimal set of processor cores may be the number of active processor cores or a designation of specific active processor cores. When a temperature reading of the processor cores is below a threshold, a set of processor cores may be selected to provide the lowest power consumption for the given workload. When the temperature reading of the processor cores is above the threshold, a set processor cores may be selected to provide the best performance for a given power budget.
Type:
Application
Filed:
November 8, 2013
Publication date:
February 12, 2015
Applicant:
QUALCOMM Incorporated
Inventors:
Hee-Jun Park, Steven S. Thomson, Ronald Frank Alton, Edoardo Regini, Satish Goverdhan, Pieter-Louis Dam Backer
Abstract: Methods, systems, and mediums are described for scheduling data parallel tasks onto multiple thread execution units of processing system. Embodiments of a lock-free queue structure and methods of operation are described to implement a method for scheduling fine-grained data-parallel tasks for execution in a computing system. The work of one of a plurality of worker threads is wait-free with respect to the other worker threads. Each node of the queue holds a reference to a task that may be concurrently performed by multiple thread execution units, but each on a different subset of data. Various embodiments relate to software-based scheduling of data-parallel tasks on a multi-threaded computing platform that does not perform such scheduling in hardware. Other embodiments are also described and claimed.
Type:
Grant
Filed:
December 17, 2010
Date of Patent:
February 10, 2015
Assignee:
Intel Corporation
Inventors:
Mohan Rajagopalan, Ali-Reza Adl-Tabatabai, Yang Ni, Adam Welc, Richard L. Hudson
Abstract: A system includes determination of first coordinates in a repository coordinate system associated with a seed component corresponding to a target build result of a first code building system, the seed component comprising a projection method between the repository coordinate system and a variant coordinate system of the first code building system, determination of second coordinates in the variant coordinate system, the second coordinates associated with an execution environment of the target build result, determination of third coordinates in the repository coordinate system based on the first coordinates, the second coordinates and the projection method, and association of the target build result with the third coordinates.
Abstract: Provided is a parallel distributed processing method executed by a computer system comprising a parallel-distributed-processing control server, a plurality of extraction processing servers and a plurality of aggregation processing servers. The managed data includes at least a first and a second data items, the plurality of data items each including a value. The method includes a step of extracting data from one of the plurality of chunks according to a value in the second data item, to thereby group the data, a step of merging groups having the same value in the second data item based on an order of a value in the first data item of data contained in a group among groups, and a step of processing data in a group obtained through the merging by focusing on the order of the value in the first data item.
Abstract: A method for compressing instruction is provided, which includes the following steps. Analyze a program code to be executed by a processor to find one or more instruction groups in the program code according to a preset condition. Each of the instruction groups includes one or more instructions in sequential order. Sort the one or more instruction groups according to a cost function of each of the one or more instruction groups. Put the first X of the sorted one or more instruction groups into an instruction table. X is a value determined according to the cost function. Replace each of the one or more instruction groups in the program code that are put into the instruction table with a corresponding execution-on-instruction-table (EIT) instruction. The EIT instruction has a parameter referring to the corresponding instruction group in the instruction table.
Type:
Application
Filed:
August 1, 2013
Publication date:
February 5, 2015
Applicant:
ANDES TECHNOLOGY CORPORATION
Inventors:
Wei-Hao Chiao, Hong-Men Su, Haw-Luen Tsai
Abstract: A computer processor includes a first instruction set and a second instruction set. The computer processor further includes a translator. The translator translates the first instruction set into the second instruction set. The computer processor is configured to execute operations using only the second complete instruction set.
Abstract: Described embodiments classify packets received by a network processor. A processing module of the network processor generates tasks corresponding to each received packet. A packet classification processor determines, independent of a flow identifier of the received task, control data corresponding to each task. A multi-thread instruction engine processes threads of instructions corresponding to received tasks, each task corresponding to a packet flow of the network processor and maintains a thread status table and a sequence counter for each flow. Active threads are tracked by the thread status table, and each status entry includes a sequence value and a flow value identifying the flow. Each sequence counter generates a sequence value for each thread by incrementing the sequence counter each time processing of a thread for the associated flow is started, and decrementing the sequence counter each time a thread for the associated flow is completed.
Type:
Grant
Filed:
November 28, 2012
Date of Patent:
February 3, 2015
Assignee:
LSI Corporation
Inventors:
Deepak Mital, James Clee, Jerry Pirog, Te Khac Ma, Steven J. Pollock
Abstract: Embodiments of systems, apparatuses, and methods for performing privilege agnostic segment base register read or write instruction are described. An exemplary method may include fetching the privilege agnostic segment base register write instruction, wherein the privilege agnostic write instruction includes a 64-bit data source operand, decoding the fetched privilege agnostic segment base register write instruction, and executing the decoded privilege agnostic segment base register write instruction to write the 64-bit data of the source operand into the segment base register identified by the opcode of the privilege agnostic segment base register write instruction.
Type:
Grant
Filed:
December 22, 2010
Date of Patent:
January 20, 2015
Assignee:
Intel Corporation
Inventors:
Baiju V. Patel, Gilbert Neiger, Martin G. Dixon, James S. Coke, James B. Crossland
Abstract: A method of increasing processing diversity on a computer system includes: loading a plurality of instruction streams, each of the plurality of instruction streams being equivalent; executing, in a context, a first stream of the plurality of instruction streams; stopping execution of the first stream at a first location of the first stream; and executing, in the context, a second stream of the plurality of instruction streams at a second location of the second stream, the second location corresponding to the first location of the first stream.
Abstract: The invention allows a processor to maintain a fixed instruction width regardless of the width of the constants needed. The constant extension solves the problem of having variable length opcodes to accommodate longer constants. The invention allows the architecture to have a fixed width, regardless of the width of the constants specified, which simplify instruction decoding. Constant widths can be variable and extend beyond the fixed processor instruction width.
Type:
Application
Filed:
July 9, 2014
Publication date:
January 15, 2015
Inventors:
Timothy David Anderson, Duc Quang Bui, Joseph Raymond Michael Zbiciak
Abstract: An embodiment may include circuitry to execute, at least in part, a first list of instructions and/or to concurrently process, at least in part, first and second buffers. The execution of the first list of instructions may result, at least in part, from invocation of a first function call. The first list of instructions may include at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions. The portions may be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry. The second and third lists of instructions may implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls. The concurrent processing may involve, at least in part, complementary algorithms.
Type:
Grant
Filed:
December 8, 2010
Date of Patent:
January 6, 2015
Assignee:
Intel Corporation
Inventors:
James D. Guilford, Wajdi K. Feghali, Vinodh Gopal, Gilbert M. Wolrich, Erdinc Ozturk, Martin G. Dixon, Deniz Karakoyunlu, Kahraman D. Akdemir
Abstract: An automated processor design tool uses a description of customized processor instruction set extensions in a standardized language to develop a configurable definition of a target instruction set, a Hardware Description Language description of circuitry necessary to implement the instruction set, and development tools such as a compiler, assembler, debugger and simulator which can be used to develop applications for the processor and to verify it. Implementation of the processor circuitry can be optimized for various criteria such as area, power consumption, speed and the like. Once a processor configuration is developed, it can be tested and inputs to the system modified to iteratively optimize the processor implementation. By providing a constrained domain of extensions and optimizations, the process can be automated to a high degree, thereby facilitating fast and reliable development.
Type:
Grant
Filed:
June 9, 2008
Date of Patent:
December 30, 2014
Assignee:
Cadence Design Systems, Inc.
Inventors:
Earl A. Killian, Ricardo E. Gonzalez, Ashish B. Dixit, Monica Lam, Walter D. Lichtenstein, Christopher Rowen, John C. Ruttenberg, Robert P. Wilson, Albert Ren-Rui Wang, Dror Eliezer Maydan
Abstract: A method of one aspect may include storing an event count of an event counter that counts events that occur during execution within a logic device. The method may further include restoring the event counter to the stored event count after the event counter has counted additional events. Other methods are also disclosed. Apparatus, systems, and machine-readable medium having software are also disclosed.
Type:
Grant
Filed:
December 26, 2009
Date of Patent:
December 30, 2014
Assignee:
Intel Corporation
Inventors:
Laura A. Knauth, Ravi Rajwar, Konrad K. Lai, Martin G. Dixon, Peggy Irelan
Abstract: Techniques described herein generally include methods for the management of hardware accelerator images in a processor chip that includes one or more programmable logic circuits. Hardware accelerator images may be optimized by swapping out which hardware accelerator images are implemented in the one or more programmable logic circuits. The hardware accelerator images may be chosen from a library of accelerator programs downloaded to a device associated with the processor chip. Furthermore, the specific hardware accelerator images that are implemented in the one or more programmable logic circuits at a particular time may be selected based on which combination of accelerator images best enhances performance and power usage of the processor chip.
Abstract: There are provided methods and computer program products for implementing instruction set architectures with non-contiguous register file specifiers. A method for processing instruction code includes processing an instruction of an instruction set using a non-contiguous register specifier of a non-contiguous register specification. The instruction includes the non-contiguous register specifier.
Type:
Grant
Filed:
March 21, 2012
Date of Patent:
December 23, 2014
Assignee:
International Business Machines Corporation
Inventors:
Michael Karl Gschwind, Robert K. Montoye, Brett Olsson, John-David Wellman
Abstract: A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.
Type:
Grant
Filed:
June 22, 2010
Date of Patent:
December 16, 2014
Assignee:
International Business Machines Corporation
Inventors:
Dan F. Greiner, Marcel Mitran, Timothy J. Slegel
Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
Type:
Grant
Filed:
December 29, 2005
Date of Patent:
December 16, 2014
Assignee:
Intel Corporation
Inventors:
Hong Wang, John Shen, Hong Jiang, Richard Hankins, Per Hammarlund, Dion Rodgers, Gautham Chinya, Baiju Patel, Shiv Kaushik, Bryant Bigbee, Gad Sheaffer, Yoav Talgam, Yuval Yosef, James P. Held
Abstract: A system, method, and computer program product for generating flow-control signals for a processing pipeline is disclosed. The method includes the steps of generating, by a first pipeline stage, a delayed ready signal based on a downstream ready signal received from a second pipeline stage and a throttle disable signal. A downstream valid signal is generated by the first pipeline stage based on an upstream valid signal and the delayed ready signal. An upstream ready signal is generated by the first pipeline stage based on the delayed ready signal and the downstream valid signal.
Type:
Application
Filed:
June 10, 2013
Publication date:
December 11, 2014
Inventors:
Philip Payman Shirvani, Peter Benjamin Sommers, Eric T. Anderson
Abstract: Some implementations provide techniques and arrangements that include a physical register file to store a speculative result of executing a operation and to store an architectural result after the operation is retired and a rename alias table to store a speculative result pointer to the speculative result stored in the physical register file, an architectural result pointer to the architectural result stored in the physical register file, and a result selection field to indicate whether to select the speculative result pointer or the architectural result pointer.
Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
Type:
Application
Filed:
June 7, 2013
Publication date:
December 11, 2014
Inventors:
Lee W. HOWES, Benedict R. GASTER, Michael C. HOUSTON
Abstract: Techniques and mechanisms for programming an accelerator device to enable performance of a data processing algorithm. In an embodiment, an accelerator of a computer platform is programmed based on programming information received from a host processor of the computer platform. In another embodiment, programming of the accelerator is to enable data driven execution of an instruction by a data stream processing engine of the accelerator.
Abstract: A data processing apparatus has at least one processing pipeline having first, second and third pipeline stages. The first pipeline stage detects whether a stream of instructions to be processed includes a predetermined instruction sequence comprising first and second instructions for performing first and second operand generation operations, where the second operand generation operation is dependent on an outcome of the first. In response to detecting this instruction sequence, the first pipeline stage generates a modified stream of instructions in which at least the second instruction is replaced with a third instruction for performing a combined operand generation operation having the same effect as the first and second operand generation operations. As the third instruction can be scheduled independently of the first instruction, processing performance of the pipeline can be improved.
Type:
Application
Filed:
May 9, 2014
Publication date:
December 11, 2014
Applicant:
ARM LIMITED
Inventors:
Ian Michael CAULFIELD, Max BATLEY, Peter Richard GREENHALGH
Abstract: In some implementations, a processor is provided having a buffer to store one or more instructions, a decoder configured to decode the one or more instructions and generate one or more decoded instructions, a processor register file to store one or more operands, and a plurality of execution units. Each execution unit includes a plurality of execution stages and a plurality of registers. The plurality of execution stages is configured to execute one or more decoded instructions using the one or more operands. The plurality of registers is positioned between the plurality of execution stages to latch data between the plurality of execution stages.
Abstract: A circuit configuration for a data processing system and a corresponding method for executing multiple tasks by way of a central processing unit having a processing capacity assigned to the processing unit, the circuit configuration being configured to distribute the processing capacity of the processing unit uniformly among the respective tasks, and to process the respective tasks in time-offset fashion until they are respectively executed.
Abstract: A processor that dynamically remaps logical cores to physical cores is disclosed. In one embodiment, the processor includes a plurality of physical cores, and is configured to store a mapping of logical cores to the plurality of physical cores. The processor further includes an assignment unit configured to remap the logical cores to the plurality of physical cores subsequent to a boot process of the processor. In some embodiments, the assignment unit is configured to remap the logical cores in response to receiving an indication that one or more of the plurality of physical cores have entered an idle state. The processor may be configured to load a first of the plurality of physical cores with an execution state of a second of the plurality of physical cores upon the first physical core exiting an idle state.
Type:
Grant
Filed:
April 14, 2011
Date of Patent:
December 9, 2014
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Michael J. Osborn, Sebastien J. Nussbaum
Abstract: A processing system and method includes a predecoder configured to identify instructions that are combinable. Instruction storage is configured to merge instructions that are combinable by replacing the combinable instructions with a wide data internal instruction for execution. An instruction execution unit is configured to execute the internal instruction on a wide datapath.
Type:
Grant
Filed:
May 2, 2006
Date of Patent:
December 2, 2014
Assignee:
International Business Machines Corporation
Abstract: Embodiments relate to simplifying large and complex networks and graphs using global connectivity information based on calculated node centralities. An aspect includes calculating node centralities of a graph until a designated number of central nodes are detected. A percentage of the central nodes are then selected as pivot nodes. The neighboring nodes to each of the pivot nodes are then collapsed until the graph shrinks to a predefined threshold of total nodes. Responsive to the number of total nodes reaching the predefined threshold, the simplified graph is outputted.
Type:
Application
Filed:
September 12, 2013
Publication date:
November 27, 2014
Applicant:
International Business Machines Corporation
Abstract: The invention provides a processor comprising an execution unit for executing multiple threads, each thread comprising a sequence of instructions and each thread being designated to handle activity from at least one specified source. The processor also comprises a thread scheduler for scheduling a plurality of threads to be executed by the execution unit, said scheduling being based on the respective activity handled by the threads; and a plurality of sets of registers connected to the execution unit. Each set of registers is arranged to store information representing a respective one of the plurality of threads, at least a part of the information being accessible by the execution unit for use in executing the respective thread when scheduled.
Abstract: Memory sharing in a software pipeline on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including segmenting a computer software application into stages of a software pipeline, the software pipeline comprising one or more paths of execution; allocating memory to be shared among at least two stages including creating a smart pointer, the smart pointer including data elements for determining when the shared memory can be deallocated; determining, in dependence upon the data elements for determining when the shared memory can be deallocated, that the shared memory can be deallocated; and d
Type:
Grant
Filed:
April 23, 2012
Date of Patent:
November 25, 2014
Assignee:
International Business Machines Corporation
Inventors:
Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
Abstract: Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result.
Type:
Grant
Filed:
December 30, 2011
Date of Patent:
November 18, 2014
Assignee:
Intel Corporation
Inventors:
Nicholas Cheng Hwa Chee, Tryggve Fossum, William C. Hasenplaugh
Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
Abstract: A data processing system comprises at least one processing unit, a virtualization layer, and a remote update programming idiom accelerator. The remote update programming idiom accelerator is configured to receive a complex remote update programming idiom from a remote node. Responsive to a determination that the sequence of instructions in the complex remote update programming idiom is longer than a dedicated processor threshold, the remote update programming idiom accelerator is configured to request a processing unit from the virtualization layer in the data processing system, and receive an allocation of a processing unit from the virtualization layer. The allocated processing unit is configured to read the data from the storage location local to the data processing system, execute the sequence of instructions to perform the update operation on the data to form result data, and write the result data to the storage location local to the data processing system.
Type:
Grant
Filed:
April 16, 2009
Date of Patent:
November 11, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values for each thread as a tag in the thread's instructions, tagged instructions from different threads that are dispatched through the system are allocated system resources based on the tagged priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies.
Type:
Grant
Filed:
November 28, 2007
Date of Patent:
November 11, 2014
Assignee:
International Business Machines Corporation
Inventors:
Louis B. Capps, Jr., Robert H. Bell, Jr.
Abstract: An untrusted application is received at a data center including one or more processing modules and providing a native processing environment. The untrusted application includes a data parallel pipeline. Secured processing environments are used to execute the untrusted application.
Type:
Grant
Filed:
December 2, 2010
Date of Patent:
November 11, 2014
Assignee:
Google Inc.
Inventors:
Craig D. Chambers, Ashish Raniwala, Frances J. Perry, Robert R. Henry, Jordan Tigani
Abstract: A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.
Abstract: An out-of-order execution microprocessor executes an architectural segment register-loading instruction that instructs the microprocessor to load a new value into an architectural segment register of the microprocessor. A comparator compares the new value specified by the architectural segment register-loading instruction with a current contents of the architectural segment register. A control unit causes to be re-executed using the new value all instructions in the microprocessor that used the current architectural segment register contents as a source operand and that are newer in program order than the architectural segment register-loading instruction whenever the comparator indicates the new value does not equal the current contents.
Type:
Grant
Filed:
February 11, 2009
Date of Patent:
November 4, 2014
Assignee:
VIA Technologies, Inc.
Inventors:
Rodney E. Hooker, Gerard M. Col, Terry Parks
Abstract: A processor includes a first and second execution unit each of which is arranged to execute multiply instructions of a first type upon fixed point operands and to execute multiply instructions of a second type upon floating point operands. A register file of the processor stores operands in registers that are each addressable by instructions for performing the first and second types of operations. An instruction decode unit is responsive to the at least one multiply instruction of the first type and the at least one multiply instruction of the second type to at the same time enable a first data path between the first set of registers and the first execution unit and to enable a second data path between a second set of registers and the second execution unit.
Type:
Grant
Filed:
September 15, 2011
Date of Patent:
November 4, 2014
Assignee:
Texas Instruments Incorporated
Inventors:
Timothy D Anderson, Duc Quang Bui, Eric Biscondi, Shriram D Moharil, Mujibur Rahman, Soujanya Narnur, Peter Richard Dent, Ashish Rai Shrivastava
Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism recognizes a programming idiom that indicates that a thread is spinning on a lock. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the lock and sets a lock bit in the wake-and-go array. The thread then goes to sleep until the lock frees. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an event at the target addresses, and may wake the thread that is spinning on the lock.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
November 4, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg