Processing Control Patents (Class 712/220)
  • Patent number: 8880853
    Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism recognizes a programming idiom that indicates that a thread is spinning on a lock. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the lock and sets a lock bit in the wake-and-go array. The thread then goes to sleep until the lock frees. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an event at the target addresses, and may wake the thread that is spinning on the lock.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: November 4, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Publication number: 20140325189
    Abstract: A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.
    Type: Application
    Filed: July 7, 2014
    Publication date: October 30, 2014
    Inventors: Jane H. Bartik, Lisa C. Heller, Damian L. Osisek, Donald W. Schmidt, Patrick M. West, JR., Phil C. Yeh
  • Publication number: 20140320391
    Abstract: A series of methods are presented to improve the operation and user experience of mobile handheld devices such as mobile phones. The methods include methods allowing useful operation on low battery levels, touch input from non-conventional models, stored procedures for executing series of actions, application management, navigational communication through vibratory motions, among others.
    Type: Application
    Filed: December 27, 2013
    Publication date: October 30, 2014
    Inventor: GAURAV BAZAZ
  • Patent number: 8874881
    Abstract: Methods and apparatus are provided for optimizing a processor core. Common processor subcircuitry is used to perform calculations for various types of instructions, including branch and non-branch instructions. Increasing the commonality of calculations across different instruction types allows branch instructions to jump to byte aligned memory address even if supported instructions are multi-byte or word aligned.
    Type: Grant
    Filed: June 17, 2011
    Date of Patent: October 28, 2014
    Assignee: Altera Corporation
    Inventor: James Loran Ball
  • Publication number: 20140317387
    Abstract: A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.
    Type: Application
    Filed: March 14, 2014
    Publication date: October 23, 2014
    Applicant: Soft Machines, Inc.
    Inventor: Mohammad ABDALLAH
  • Patent number: 8868886
    Abstract: A performance monitoring technique provides task-switch immune operation without requiring storage and retrieval of the performance monitor state when a task switch occurs. When a hypervisor signals that a task is being resumed, it provides an indication, which starts a delay timer. The delay timer is resettable in case a predetermined time period has not elapsed when the next task switch occurs. After the delay timer expires, analysis of the performance monitor measurements is resumed, which prevents an initial state or a state remaining from a previous task from corrupting the performance monitoring results. The performance monitor may be or include an execution trace unit that collects taken branches in a current trace and may use branch prediction success to determine whether to collect a predicted and taken branch instruction in a current trace or to start a new segment when the branch resolves in a non-predicted direction.
    Type: Grant
    Filed: April 4, 2011
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Giles R. Frazier, David S. Levitan, Brian R. Mestan
  • Patent number: 8856498
    Abstract: A prefetch request circuit is provided in a processor device. The processor device has hierarchized storage areas and can prefetch data of address to be used between appropriate storage areas among the storage areas, when executing respective instruction flows obtained by multi-flow expansion for one instruction at a time of decoding of the instruction. The prefetch request circuit includes a latch unit to hold, when a state in which the respective instruction flows to access the storage area are executed with a maximum specifiable data transfer volume is specified, the state during a time period of the multi-flow expansion; and a prefetch request signal output unit to output a prefetch request signal to request the prefetch every time when the instruction flow is executed, based on an output signal of the latch unit and a signal indicating an execution timing of the respective instruction flows.
    Type: Grant
    Filed: August 29, 2011
    Date of Patent: October 7, 2014
    Assignee: Fujitsu Limited
    Inventors: Atsushi Fusejima, Norihito Gomyo
  • Patent number: 8850165
    Abstract: In a multi-threaded processor, thread priority variables are set up in memory. The actual assignment of thread priority is based on the expiration of a thread precedence counter. To further augment, the effectiveness of the thread precedence counters, starting counters are associated with each thread that serve as a multiplier for the value to be used in the thread precedence counter. The value in the starting counters are manipulated so as to prevent one thread from getting undue priority to the resources of the multi-threaded processor.
    Type: Grant
    Filed: June 7, 2011
    Date of Patent: September 30, 2014
    Assignee: Intel Corporation
    Inventors: David W. Burns, James D. Allen, Michael D. Upton, Darrell D. Boggs, David J. Sager
  • Patent number: 8850436
    Abstract: One embodiment of the present invention sets forth a technique for performing a method for synchronizing divergent executing threads. The method includes receiving a plurality of instructions that includes at least one set-synchronization instruction and at least one instruction that includes a synchronization command, and determining an active mask that indicates which threads in a plurality of threads are active and which threads in the plurality of threads are disabled. For each instruction included in the plurality of instructions, the instruction is transmitted to each of the active threads included in the plurality of threads. If the instruction is a set-synchronization instruction, then a synchronization token, the active mask and the synchronization point is each pushed onto a stack.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: September 30, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Robert Steven Glanville
  • Patent number: 8850446
    Abstract: A system, computer program and a method for preventing starvations of tasks in a multiple-processing entity system, the method includes: examining, during each scheduling iteration, an eligibility of each task data structure out of a group of data structures to be moved from a sorted tasks queue to a ready for execution task; updating a value, during each scheduling iteration, of a queue starvation watermark value of each task data structure that is not eligible to move to a running tasks queue, until a queue starvation watermark value of a certain task data structure out of the group reaches a queue starvation watermark threshold; and generating a task starvation indication if during an additional number of scheduling iterations, the certain task data structure is still prevented from being moved to a running tasks queue, and the additional number is responsive to a task starvation watermark.
    Type: Grant
    Filed: June 19, 2008
    Date of Patent: September 30, 2014
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Hillel Avni, Dov Levenglick, Avishay Moskowiz
  • Publication number: 20140281418
    Abstract: An apparatus includes packed data registers and an execution unit. An instruction is to indicate a first source packed data that is to include a first packed data elements, a second source packed data that is to include a second packed data elements, and a destination storage location. The execution unit, in response to the instruction, is to store a packed data result that is to include packed result data elements in the destination storage location. Each of the result data elements is to correspond to a different one of the data elements of the second source packed data. Each of the result data elements is to include a multiple bit comparison mask that is to include a different comparison mask bit for each different corresponding data element of the first source packed data compared with the corresponding data element of the second source packed data.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventor: Shihjong J. Kuo
  • Publication number: 20140281417
    Abstract: A system for passing data, the system including multiple data producers passing processed data, wherein the processed data include discrete data units that are each consecutively numbered, each of the data producers calculating insertion indices for ones of the data units passing therethrough; a circular buffer receiving the data units from the producers, the data units placed in slots that correspond to the respective insertion indices; and a consumer of the data units that receives the data units from the circular buffer in an order that preserves sequential numbering of the data units, wherein the multiple data producers follow a protocol so that a first one of the data producers, upon failing to place a first data unit in the circular buffer, does not lock other data producers from placing other data units in the circular buffer.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: GENBAND US LLC
    Inventor: Matthew Lorne Peters
  • Patent number: 8838940
    Abstract: Indicating usage in a system is disclosed. Indicating includes obtaining active thread information related to a number of hardware threads in a processor core, combining the active thread information with information related to a decreasing ability of the processor core to increase throughput by utilizing additional hardware threads, and indicating the usage in the system based at least in part on both the active thread information and the ability of the processor core to increase throughput by utilizing additional hardware threads.
    Type: Grant
    Filed: June 7, 2006
    Date of Patent: September 16, 2014
    Assignee: Azul Systems, Inc.
    Inventors: Gil Tene, Michael A. Wolf, Cliff N. Click, Jr.
  • Patent number: 8838941
    Abstract: Methods for instruction execution and synchronization in a multi-thread processor are provided, wherein in the multi-thread processor, multiple threads are running and each of the threads can simultaneously execute a same instruction sequence. A source code or an object code is received and then compiled to generate the instruction sequence. Instructions for all of function calls within the instruction sequence are sorted according to a calling order. Each thread is provided a counter value pointing to one of the instructions in the instruction sequence. A main counter value is determined according to the counter values of the threads such that all of the threads simultaneously execute an instruction of the instruction sequence that the main counter value points to.
    Type: Grant
    Filed: March 8, 2011
    Date of Patent: September 16, 2014
    Assignee: Via Technologies, Inc.
    Inventor: Yangang Zhang
  • Publication number: 20140258688
    Abstract: Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.
    Type: Application
    Filed: March 7, 2013
    Publication date: September 11, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Mauricio Breternitz, Anton Chernoff, Keith A. Lowery
  • Patent number: 8832325
    Abstract: Migrating data from a source storage device to a target storage device includes creating new paths to the target storage device, setting the target storage device to a state where I/O operations are initially accepted, where accepted I/O operations are rejected some time after acceptance, setting the source storage device to a state where at least some I/O operations are rejected, transferring metadata corresponding to the source storage device to the target storage device, where state information is transferred from the source storage device to the target storage device and setting the target storage device to a state where I/O operations are accepted and performed. Migrating data from a source storage device to a target storage device may also include creating new volumes on the target storage device and transferring data from the source storage device to the target storage device.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: September 9, 2014
    Assignee: EMC Corporation
    Inventors: Subin George, Michael J. Scharland, Arieh Don
  • Patent number: 8826299
    Abstract: According to embodiments of the invention, methods and apparatus are provided for tracking the status or state of a message spawned or sent from one processing element to another processing element in a multiple core processing element network. According to embodiments of the invention, a message status tracker may be incorporated within a multiple core processing element network. As a message is spawned or sent from an originating processing element to a receiving processing element, a counter within the message status tracker may be incremented. If the receiving processing element spawns further messages in response to the received message, the counter may be further incremented. When a receiving processing element finishes a process in response to a received message, the receiving processing element may decrement the counter. When the counter is decremented to an original value (e.g., zero) the original message may be considered complete.
    Type: Grant
    Filed: August 13, 2007
    Date of Patent: September 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Jon K. Kriegel, Mark Gary Kupferschmidt, Paul Emery Schardt
  • Patent number: 8825989
    Abstract: A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
    Type: Grant
    Filed: October 25, 2013
    Date of Patent: September 2, 2014
    Assignee: Intel Corporation
    Inventors: Avinash Sodani, Stephan Jourdan, Alexandre Farcy, Per Hammarlund
  • Publication number: 20140244979
    Abstract: Techniques for estimating time remaining for an operation are described. Examples operations include file operations, such as file move operations, file copy operations, and so on. A wide variety of different operations may be considered in accordance with the claimed embodiments, further examples of which are discussed below. In at least some embodiments, estimating a time remaining for an operation can be based on a state of the operation. A state of an operation, for example, can be based on events related to the operation itself, such as the operation being initiated, paused, resumed, and so on. A state of an operation can also be based on events related to other operations.
    Type: Application
    Filed: February 22, 2013
    Publication date: August 28, 2014
    Applicant: Microsoft Corporation
    Inventors: Francisco Alvarez Cavazos, Jordi Mola
  • Publication number: 20140244981
    Abstract: A processor includes a programmable logic circuit provided with a plurality of processing units. The programmable logic circuit is capable of reconfiguring a first logic circuit corresponding to first circuit configuration information according to a first process and a second logic circuit corresponding to second circuit configuration information according to a second process. Each of the first and second logic circuits includes an information holding unit. A first control circuit stores the second circuit configuration information in the information holding unit of the first logic circuit and generates an execution control signal for executing the first process. A second control circuit obtains the second circuit configuration information from the information holding unit of the first logic circuit in response to completion of the first process and controls the programmable logic circuit so as to reconfigure the second logic circuit corresponding to the second circuit configuration information.
    Type: Application
    Filed: February 24, 2014
    Publication date: August 28, 2014
    Applicant: FUJITSU SEMICONDUCTOR LIMITED
    Inventor: Kazuhiko OKADA
  • Publication number: 20140244983
    Abstract: An apparatus includes a first processor having a first instruction set and a second processor having a second instruction set that is different than the first instruction set. The apparatus also includes a memory storing at least a portion of an operating system. The operating system is concurrently executable on the first processor and the second processor.
    Type: Application
    Filed: February 26, 2013
    Publication date: August 28, 2014
    Applicant: Qualcomm Incorporated
    Inventors: Michael R. McDonald, Erich J. Plondke, Pavel Potoplyak, Lucian Codrescu, Richard Kuo, Bryan C. Bayerdorffer
  • Publication number: 20140244980
    Abstract: Method, system, and programs for dynamic control of a processing system having a plurality of tiers. Queue lengths of a plurality of nodes in one of the plurality of tiers are received. A control objective is received from a higher tier. One or more requests from the higher tier are processed by the plurality of nodes in the tier. A control model of the tier is computed based on the received queue lengths. One or more parameters of the control model are adjusted based on the received control objective. At least one control action is determined based on the control model and the control objective.
    Type: Application
    Filed: February 25, 2013
    Publication date: August 28, 2014
    Inventor: Masood Mortazavi
  • Patent number: 8819397
    Abstract: Methods and apparatuses are provided for increased efficiency in a processor via control word prediction. The apparatus comprises an operational unit capable of determining whether an instruction will change a first control word to a second control word for processing dependent instructions. Execution units process the dependent instructions using a predicted control word and compare the second control word to the predicted control word. A scheduling unit causes the execution units to reprocess the dependent instructions when the predicted control word does not match the second control word. The method comprises determining that an instruction will change a first control word to a second control word and processing the dependent instructions using a predicted control word. The second control word is compared to the predicted control word and the dependent instructions are reprocessed using the second control word when the predicted control word does not match the second control word.
    Type: Grant
    Filed: March 1, 2011
    Date of Patent: August 26, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael D. Estlick, Jay Fleischman, Debjit Das Sarma, Emil Talpes, Krishnan V. Ramani, Chun Liu
  • Patent number: 8819399
    Abstract: Some embodiments provide a system that executes a native code module. During operation, the system obtains the native code module. Next, the system loads the native code module into a secure runtime environment. Finally, the system safely executes the native code module in the secure runtime environment by using a set of software fault isolation (SFI) mechanisms that use predicated store instructions and predicated control flow instructions, wherein each predicated instruction from the predicated store instructions and the predicated control flow instructions is executed if a mask condition associated with the predicated instruction is met.
    Type: Grant
    Filed: November 20, 2009
    Date of Patent: August 26, 2014
    Assignee: Google Inc.
    Inventors: Robert Muth, Karl Schmipf, David C. Sehr, Clifford L. Biffle
  • Publication number: 20140223151
    Abstract: A method for executing kernels in a hybrid system includes running a program on a host computer and identifying in an instruction stream of the program a first instruction including a function of a target classification. The method includes generating a first kernel including the function and transmitting the first kernel to a client system to execute the first kernel based on identifying the first instruction as being of the target classification. The method also includes determining whether to store results of executing the first kernel in a read-only buffer of the client system based on determining whether a subsequent instruction of the target classification relies upon results of the first instruction.
    Type: Application
    Filed: February 4, 2013
    Publication date: August 7, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: D. Gary Chapman, Rajaram B. Krishnamurthy, Deborah A. Odell, Benjamin P. Segal
  • Publication number: 20140223150
    Abstract: An information processing apparatus includes a first preservation unit configured to preserve execution request information for information processing; an execution unit configured to execute one or more types of the information processing; an execution control unit configured to have the execution unit being capable of executing one of the types of the information processing execute the information processing of the execution request information preserved by the first preservation unit; and a second preservation unit configured to preserve a stop command of the execution unit. If the execution unit does not execute the information processing, the execution control unit checks the second preservation unit if the second preservation unit preserves the stop command to have the execution unit execute a stop procedure.
    Type: Application
    Filed: January 24, 2014
    Publication date: August 7, 2014
    Applicant: RICOH COMPANY, LTD.
    Inventors: Tadashi Honda, Tetsuharu Kohkaki, Kenta Yamano, Tomoya Amikura, Masateru Kumagai, Yuuichiroh Hayashi
  • Publication number: 20140223145
    Abstract: A processor may be built with cores that only execute some partial set of the instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant.
    Type: Application
    Filed: December 30, 2011
    Publication date: August 7, 2014
    Applicant: Intel Corporation
    Inventors: Srihari Makineni, Steven R. King, Zhen Fang, Alexander Redkin, Ravishankar Iyer, Pavel S. Smirnov, Dmitry Gusev, Dmitri Pavlov, May Wu
  • Publication number: 20140223146
    Abstract: Reading a value into a register, checking to see if the value is a NULL, and then jumping out of a loop if the value is a NULL is a common task that processors perform. To speed performance of such a task, a novel “blank bit” is added to the flag register of a processor. When a first instruction (arithmetic, logic or load) is executed, the instruction operands are checked to see if any is a NULL character value. Information on the result of the check is stored in the blank bit. Execution of a second instruction uses the information stored in the blank bit to determine whether or not a second operation (for example, a jump) will be performed. By using the first and second instructions in a loop, the number of instructions executed to check for NULLs at the end of strings and arrays is reduced.
    Type: Application
    Filed: April 8, 2014
    Publication date: August 7, 2014
    Applicant: IXYS CH GmbH
    Inventor: Gyle D. Yearsley
  • Patent number: 8797868
    Abstract: A network device of a communication network is configured to implement coordinated scheduling and processor rate control. In one aspect, packets are received in the network device and scheduled for processing from one or more queues of that device. An operating rate of a processor of the network device is controlled based at least in part on an optimal operating rate of the processor that is determined using a non-zero base power of the processor. For example, the operating rate of the processor may be controlled such that the processor either operates at or above the optimal operating rate, or is substantially turned off. The optimal operating rate of the processor may be selected so as to fall on a tangent line of a power-rate curve of the processor that also passes through an origin point of a coordinate system of the power-rate curve.
    Type: Grant
    Filed: March 14, 2012
    Date of Patent: August 5, 2014
    Assignee: Alcatel Lucent
    Inventors: Daniel Matthew Andrews, Yihao Zhang
  • Publication number: 20140215192
    Abstract: A compiler tool-chain may automatically compile an application to execute on a limited local memory (LLM) multi-core processor by including automated heap management transparently to the application. Management of the heap in the LLM for the application may include identifying access attempts to a program variable, transferring the program variable to the LLM, when not already present in the LLM, and returning a local address for the program variable to the application. The application then accesses the program variable using the local address transparently without knowledge about data in the LLM. Thus, the application may execute on a LLM multi-core processor as if the LLM multi-core processor has an unlimited heap space.
    Type: Application
    Filed: January 28, 2014
    Publication date: July 31, 2014
    Applicant: ARIZONA BOARD OF REGENTS ON BEHALF OF ARIZONA STATE UNIVERSITY
    Inventors: Ke Bai, Aviral Shrivastava
  • Publication number: 20140215190
    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: APPLE INC.
    Inventors: John H. Mylius, Rajat Goel, Pradeep Kanapathipillai, Hari S. Kannan
  • Publication number: 20140215191
    Abstract: Techniques are disclosed relating to ordering of load instructions in a weakly-ordered memory model. In one embodiment, a processor includes a cache with multiple cache lines and a store queue configured to maintain status information associated with a store instruction that targets a location in one of the cache lines. In this embodiment, the processor is configured to set an indicator in the status information in response to migration of the targeted cache line. The indicator may be usable to sequence performance of load instructions that are younger than the store instruction. For example, the processor may be configured to wait, based on the indicator, to perform a younger load instruction that targets the same location as the store instruction until the store instruction is removed from the store queue. This may prevent forwarding of the value of the store instruction to the younger load and preserve load-load ordering.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: APPLE INC.
    Inventors: Pradeep Kanapathipillai, Hari Kannan, Po-Yung Chang, Ming-Ta Hsu, Rajat Goel
  • Patent number: 8793469
    Abstract: A computer, circuit, and computer-readable medium are disclosed. In one embodiment, the processor includes an instruction decoder unit that can decode a macro instruction into at least one micro-operation with a set of data fields. The resulting micro-operation has at least one data field that is in a compressed form. The instruction decoder unit has storage that can store the micro-operation with the compressed-form data field. The instruction decoder unit also has extraction logic that is capable of extracting the compressed-form data field into an uncompressed-form data field. After extraction, the instruction decoder unit also can send the micro-operation with the extracted uncompressed-form data field to an execution unit. The computer also includes an execution unit capable of executing the sent micro-operation.
    Type: Grant
    Filed: December 17, 2010
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Kameswar Subramaniam, Anthony Wojciechowski, Jonathan D. Combs
  • Patent number: 8793689
    Abstract: A redundant multithreading processor is presented. In one embodiment, the processor performs execution of a thread and its duplicate thread in parallel and determines, when in a redundant multithreading mode, whether or not to synchronize an operation of the thread and an operation of the duplicate thread.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Glenn J. Hinton, Steven E. Raasch, Avinash Sodani, Sebastien Hily, John G. Holm, Ronak Singhal, Deborah T. Marr
  • Publication number: 20140208076
    Abstract: A character class (CCL) memory containing simple CCLs represented by encoding contained symbols or minimum and maximum symbols of a range, complex CCLs represented by bit-masks indicating contained symbols, and equivalence class (EC) maps represented as tables of ED values for each symbol value. Determining a next DFA transition by comparing multiple CCLs with a single input symbol, and selecting a transition according to the first matching CCL, or selecting a transition corresponding to a vector of CCL match result bits. Comparing CCLs from one DFA instruction to determine a transition and if no CCLs match, comparing CCLs from a second DDFA instruction to determine the transition. Matching linear sequence of two or more DFA states using a sequence of multiple CCLs encoded in a single DFA instruction.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: LSI CORPORATION
    Inventor: Michael Ruehle
  • Patent number: 8788795
    Abstract: A wake-and-go mechanism may be a programming idiom accelerator. As a processor fetches instructions, the programming idiom accelerator may look ahead to determine whether a programming idiom is coming up in the instruction stream. If the programming idiom accelerator recognizes a programming idiom, the programming idiom accelerator may perform an action to accelerate execution of the programming idiom. In the case of a wake-and-go programming idiom, the programming idiom accelerator may record an entry in a wake-and-go array, for example.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 22, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Patent number: 8789065
    Abstract: Systems and methods provide an extensible, multi-stage, realtime application program processing load adaptive, manycore data processing architecture shared dynamically among instances of parallelized and pipelined application software programs, according to processing load variations of said programs and their tasks and instances, as well as contractual policies. The invented techniques provide, at the same time, both application software development productivity, through presenting for software a simple, virtual static view of the actually dynamically allocated and assigned processing hardware resources, together with high program runtime performance, through scalable pipelined and parallelized program execution with minimized overhead, as well as high resource efficiency, through adaptively optimized processing resource allocation.
    Type: Grant
    Filed: November 23, 2012
    Date of Patent: July 22, 2014
    Assignee: Throughputer, Inc.
    Inventor: Mark Henrik Sandstrom
  • Publication number: 20140201498
    Abstract: Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
    Type: Application
    Filed: September 26, 2011
    Publication date: July 17, 2014
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Kshitij A. Doshi, Charles R. Yount, Suleyman Sair
  • Patent number: 8782378
    Abstract: A data processing apparatus and method are provided. The data processing apparatus is configured to perform data processing operations in response to data processing instructions including a multiple operation instruction, in response to which multiple data processing operations are performed. The data processing apparatus comprises two or more data processing units configured to perform the data processing operations and an instruction arbitration unit configured to perform sub-division of a multiple operation instruction into a plurality of sub-instructions and to perform allocation of the plurality of sub-instructions amongst the two or more data processing units, wherein each sub-instruction is arranged to cause one of the two or more data processing units to perform at least one data processing operation of the multiple data processing operations.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: July 15, 2014
    Assignee: ARM Limited
    Inventors: Nicolas Chaussade, Rémi Teyssier
  • Patent number: 8773455
    Abstract: A display controller may include an RGB Interface module and a display port module, which may both use a target-master interface, in which the data receiving module pops pixels from the data sourcing module, and generates the HSync, VSync, and VBI timing signals. A dither module may be instantiated between the RGB interface module and display port module to perform dithering. The dither module may use a source-master interface, in which data signals and data valid signals are issued by the data sourcing module. In order to avoid having to use a large storage capacity FIFO with the dither module, a control unit may issue interface signals to the RGB Interface module and display port module, and clock-gate the dither module, to allow the data signals and data valid signals to properly interface with the RBG interface module and display port module, and provide data flow from the RGB interface module to the dither module to the display port module.
    Type: Grant
    Filed: August 11, 2011
    Date of Patent: July 8, 2014
    Assignee: Apple Inc.
    Inventors: Brijesh Tripathi, Nitin Bhargava
  • Patent number: 8775778
    Abstract: A set of helper thread binaries is created from a set of main thread binaries. The helper thread monitors software or hardware ports for incoming data events. When the helper thread detects an incoming event, the helper thread asynchronously executes instructions that calculate incoming data needed by the main thread.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
  • Publication number: 20140189318
    Abstract: This document discusses, among other things, systems and methods to access n consecutive entries of a register file in a single operation using a register file entry index consisting of B bits, wherein B is less than the binary logarithm of a depth of the register file, which corresponds to the number of entries in the register file, and to automatically select, for a set of register arguments for the n consecutive entries, between a register port for each argument requiring a register port or one or more shared register ports for the set of register arguments according to description of an instruction set architecture associated with the register file.
    Type: Application
    Filed: December 31, 2012
    Publication date: July 3, 2014
    Applicant: Tensilica Inc.
    Inventor: Fei Sun
  • Publication number: 20140189308
    Abstract: Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Application
    Filed: December 29, 2012
    Publication date: July 3, 2014
    Inventors: Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Brett L. Toll, Mark J. Charney, Milind B. Girkar
  • Publication number: 20140189317
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Publication number: 20140189316
    Abstract: In one embodiment, in an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit is provided. The method includes producing output data with the producing execution subunit, consuming input data with the consuming execution subunit, for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, evaluating those one or more intervening operations to determine whether their execution would compose an identify function, and if the one or more intervening operations would compose such an identity function, controlling the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Gokul Govindu, Parag Gupta, Scott Pitkethly, Guillermo J. Rozas
  • Patent number: 8768642
    Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 1, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8769539
    Abstract: A method and apparatus are provided to control the order of execution of load and store operations. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. One embodiment of the method includes determining whether a first group, comprising at least one or more instructions, is to be selected from a scheduling queue of a processor for execution using either a first execution mode or a second execution mode. The method also includes, responsive to determining that the first group is to be selected for execution using the second execution mode, preventing selection of the first group until a second group, comprising at least one or more instructions, that entered the scheduling queue prior to the first group is selected for execution.
    Type: Grant
    Filed: November 16, 2010
    Date of Patent: July 1, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Daniel Hopper, Suzanne Plummer, Christopher D. Bryant
  • Publication number: 20140181479
    Abstract: Described herein are mechanisms for creating, executing, and terminating mini-threads. A processor executes instructions with a primary thread in a first execution mode, and to execute an instruction to create a secondary mini-thread that is associated with a first subset of registers and associates the primary thread with a second subset of the registers during a second execution mode. During the second execution mode, the primary thread operates as a primary mini-thread.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Inventor: Ruchira Sasanka
  • Publication number: 20140181480
    Abstract: An apparatus, computer readable medium, and method of performing nested speculative regions are presented. The method includes responding to entering a speculative region by storing link information to an abort handler and responding to a commit command by removing link information from the abort handler. The method may include storing link information to the abort handler associated with the speculative region. When the speculative region is nested, the method may include storing link information to an abort handler associated with a previous speculative region. Removing link information may include removing link information from the abort handler associated with the corresponding speculative region. The method may include restoring link information to the abort handler associated with a previous speculative region. Responding to an abort command may include running the abort handler associated with the aborted speculative region.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Stephan Diestelhorst, Martin Pohlack, Michael Hohmuth, David Christie, Luke Yen
  • Patent number: 8763002
    Abstract: A system for task allocation of a multi-core processor is provided. The system includes a task allocator and a plurality of sub-processing systems. Each of the sub-processing systems comprises a state register, a processor core, and a buffer, the state register is configured to recognize state of the sub-processing systems, and transmit state information of the sub-processing systems to the task allocator, the state information comprises: a first state bit configured to indicate whether sub-processing systems are in Idle state; and a second state bit configured to indicate a specific state of the sub-processing systems. The task allocator is configured to allocate task to the sub-processing systems according to a priority determined by the state information sent by the state registers of the sub-processing systems.
    Type: Grant
    Filed: May 3, 2011
    Date of Patent: June 24, 2014
    Assignee: Huawei Technologies Co., Ltd.
    Inventor: Zhongming Hou