Processing Control For Data Transfer Patents (Class 712/225)
-
Publication number: 20120198213Abstract: A packet handler for a packet processing system includes a plurality of parallel action machines, each of the plurality of parallel action machines being configured to perform a respective packet processing function; and a plurality of action machine input registers, wherein each of the plurality of parallel action machines is associated with one or more of the plurality of action machine input registers, and wherein an action machine of the plurality of parallel action machines is automatically triggered to perform its respective packet processing function in the event that data sufficient to perform the actions machine's respective packet processing function is written into the action machine's one or more respective action machine input registers.Type: ApplicationFiled: January 31, 2011Publication date: August 2, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Francois Abel, Jean Calvignac, Christoph Hagleitner, Fabrice Verplanken
-
Patent number: 8234483Abstract: A computing and communication chip architecture is provided wherein the interfaces of processor access to the memory chips are implemented as a high-speed packet switched serial interface as part of each chip. In one embodiment, the interface is accomplished through a gigabit Ethernet interface provided by protocol processor integrated as part of the chip. The protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor and memory chips that are located on the same motherboard, or even on different circuit cards. In one embodiment, the communication over head of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.Type: GrantFiled: October 25, 2010Date of Patent: July 31, 2012Assignee: Psimast, IncInventor: Viswa Nath Sharma
-
Patent number: 8234653Abstract: A data processing architecture includes multiple processors connected in series between a load balancer and reorder logic. The load balancer is configured to receive data and distribute the data across the processors. Appropriate ones of the processors are configured to process the data. The reorder logic is configured to receive the data processed by the processors, reorder the data, and output the reordered data.Type: GrantFiled: May 30, 2008Date of Patent: July 31, 2012Assignee: Juniper Networks, Inc.Inventors: John C Carney, Michael E Lipman
-
Patent number: 8230410Abstract: An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus.Type: GrantFiled: October 26, 2009Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventor: Larry W. Loen
-
Patent number: 8230179Abstract: Administering non-cacheable memory load instructions in a computing environment where cacheable data is produced and consumed in a coherent manner without harming performance of a producer, the environment including a hierarchy of computer memory that includes one or more caches backed by main memory, the caches controlled by a cache controller, at least one of the caches configured as a write-back cache. Embodiments of the present invention include receiving, by the cache controller, a non-cacheable memory load instruction for data stored at a memory address, the data treated by the producer as cacheable; determining by the cache controller from a cache directory whether the data is cached; if the data is cached, returning the data in the memory address from the write-back cache without affecting the write-back cache's state; and if the data is not cached, returning the data from main memory without affecting the write-back cache's state.Type: GrantFiled: May 15, 2008Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventors: Jon K. Kriegel, Jamie R. Kuesel
-
Publication number: 20120185679Abstract: Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing by the parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.Type: ApplicationFiled: January 17, 2011Publication date: July 19, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Bob R. Cernohous, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8225012Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.Type: GrantFiled: September 3, 2009Date of Patent: July 17, 2012Assignee: Intel CorporationInventor: Thomas A. Piazza
-
Patent number: 8219788Abstract: A virtual core management system including a first physical core having a first utilization constraint, a second physical core having a second utilization constraint, and a virtual core including a collection of logical states associated with execution of a program. The virtual core management system further includes a utilization indicator configured to measure a utilization of the first physical core with respect to the first utilization constraint and measure a utilization of the second physical core with respect to the second utilization constraint, and a virtual core management component configured to map the virtual core to one of the first physical core and the second physical core based on at least one of the utilization of the first physical core and the utilization of the second physical core.Type: GrantFiled: October 31, 2007Date of Patent: July 10, 2012Assignee: Oracle America, Inc.Inventors: Yu Qing Cheng, Peter N. Glaskowsky, Carlos Puchol, Seungyoon Peter Song
-
Patent number: 8219787Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.Type: GrantFiled: May 9, 2011Date of Patent: July 10, 2012Assignee: Apple Inc.Inventors: Wei-Han Lien, Po-Yung Chang
-
Patent number: 8214626Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.Type: GrantFiled: March 31, 2009Date of Patent: July 3, 2012Assignee: Intel CorporationInventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
-
Patent number: 8209523Abstract: A data moving processor includes a code memory coupled to a code fetch circuit and a decode circuit coupled to the code fetch circuit. An address stack is coupled to the decode circuit and configured to store address data. A general purpose stack is coupled to the decode circuit and configured to store other data. The data moving processor uses data from the general purpose stack to perform calculations. The data moving processor uses address data from the address stack to identify source and destination memory locations. The address data may be used to drive an address line of a memory during a read or write operation. The address stack and general purpose stack are separately controlled using bytecode.Type: GrantFiled: January 22, 2009Date of Patent: June 26, 2012Assignee: Intel Mobile Communications GmbHInventors: Ulf Nordqvist, Jinan Lin, Xiaoning Nie, Stefan Maier, Siegmar Koeppe
-
Patent number: 8200949Abstract: A multi-threaded processor system, method, and computer program product capable of utilizing a register file cache are provided for simultaneously processing a plurality of threads. A processor capable of simultaneously processing a plurality of threads is provided. The processor includes a register file and a register file cache in communication with the register file.Type: GrantFiled: December 9, 2008Date of Patent: June 12, 2012Assignee: NVIDIA CorporationInventors: David Tarjan, Kevin Skadron
-
Patent number: 8200950Abstract: A pipeline operation processor comprises a pipeline processing unit and an instruction insertion controller which inserts an instruction when access to an operation memory is requested, and corrects control information by reference to control information of stages. When a control program is in execution, on receiving an access request instruction requesting for access to the operation memory, the instruction insertion controller inserts an NOP instruction from the instruction decoding unit in place of the access request instruction. The access request instruction is executed while the pipeline processing unit executes no operation, and subsequently, the pipeline processing is continued.Type: GrantFiled: June 4, 2009Date of Patent: June 12, 2012Assignee: Kabushiki Kaisha ToshibaInventor: Motohiko Okabe
-
Patent number: 8200941Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: GrantFiled: April 15, 2011Date of Patent: June 12, 2012Assignee: Intel CorporationInventor: Patrice Roussel
-
Patent number: 8195924Abstract: A method and system for early instruction text based operand store compare avoidance in a processor are provided. The system includes a processor pipeline for processing instruction text in an instruction stream, where the instruction text includes operand address information. The system also includes delay logic to monitor the instruction stream. The delay logic performs a method that includes detecting a load instruction following a store instruction in the instruction stream, comparing the operand address information of the store instruction with the load instruction. The method also includes delaying the load instruction in the processor pipeline in response to detecting a common field value between the operand address information of the store instruction and the load instruction.Type: GrantFiled: March 17, 2011Date of Patent: June 5, 2012Assignee: International Business Machines CorporationInventors: Khary J. Alexander, Fadi Y. Basuba, Bruce C. Giamei, David S. Hutton, Chung-Lung K. Shum
-
Patent number: 8190867Abstract: A processor comprising a register file, and a decoder to decode an instruction to specify a first source register having a first packed signed 16-bit integers, and to specify a second source register having a second packed signed 16-bit integers. A functional unit to generate a result to be stored in a specified destination. The result including a third packed 8-bit integers including an integer for each integer in the first packed integers, and an integer for each integer in the second packed integers. The integers corresponding to the first packed integers next to one another in the result. The integers corresponding to the second packed integers next to one another. A highest order integer of the result corresponding to a highest order integer of the first packed integers. A lowest order integer of the result corresponding to a lowest order integer of the second packed integers.Type: GrantFiled: May 16, 2011Date of Patent: May 29, 2012Assignee: Intel CorporationInventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
-
Publication number: 20120124335Abstract: Details of a highly cost effective and efficient implementation of a manifold array (ManArray) architecture and instruction syntax for use therewith are described herein. Various aspects of this approach include the regularity of the syntax, the relative ease with which the instruction set can be represented in database form, the ready ability with which tools can be created, the ready generation of self-checking codes and parameterized test cases. Parameterizations can be fairly easily mapped and system maintenance is significantly simplified.Type: ApplicationFiled: January 5, 2012Publication date: May 17, 2012Applicant: ALTERA CORPORATIONInventors: Gerald G. Pechanek, David Carl Strube, Edwin Franklin Barry, Charles W. Kurak, JR., Carl Donald Busboom, Dale Edward Schneider, Nikos P. Pitsianis, Grayson Morris, Edward A. Wolff, Patrick R. Marchand, Ricardo E. Rodriguez, Marco C. Jacobs
-
Patent number: 8181003Abstract: Improved instruction set and core design, control and communication for programmable microprocessors is disclosed, involving the strategy for replacing centralized program sequencing in present-day and prior art processors with a novel distributed program sequencing wherein each functional unit has its own instruction fetch and decode block, and each functional unit has its own local memory for program storage; and wherein computational hardware execution units and memory units are flexibly pipelined as programmable embedded processors with reconfigurable pipeline stages of different order in response to varying application instruction sequences that establish different configurations and switching interconnections of the hardware units.Type: GrantFiled: May 29, 2008Date of Patent: May 15, 2012Assignee: Axis Semiconductor, Inc.Inventors: Xiaolin Wang, Qian Wu, Benjamin Marshall, Fugui Wang, Gregory Pitarys, Ke Ning
-
Publication number: 20120117420Abstract: A method of implementing a mask load or mask store instruction by a processor is provided. The method may include receiving the mask load or mask store instruction, a location of a memory operand and a location of corresponding mask bits associated with the memory operand, breaking the received memory operand into a plurality of sub-operands and executing the mask load or mask store instruction on each of the plurality of sub-operands using a fastpath operation or using microcode, wherein the respective mask load or mask store instruction loads or stores each of the plurality of sub-operands based upon the corresponding mask bits.Type: ApplicationFiled: November 5, 2010Publication date: May 10, 2012Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Kelvin GOVEAS, Edward MCLELLAN, Steven BEIGELMACHER, David KROESCHE, Michael CLARK
-
Publication number: 20120110309Abstract: Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.Type: ApplicationFiled: October 29, 2010Publication date: May 3, 2012Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Laurent Lefebvre, Michael Mantor, Robert Hankinson
-
Patent number: 8171267Abstract: A method and apparatus for migrating a task in a multi-processor system. The method includes examining whether a second process has been allocated to a second processor, the second process having a same instruction to execute as a first process and having different data to process in response to the instruction from the first process, the instruction being to execute the task; selecting a method of migrating the first process or a method of migrating a thread included in the first process based on the examining and migrating the task from a first processor to the second processor using the selected method. Therefore, cost and power required for task migration can be minimized. Consequently, power consumption can be maintained in a low-power environment, such as an embedded system, which, in turn, optimizes the performance of the multi-processor system and prevents physical damage to the circuit of the multi-processor system.Type: GrantFiled: June 30, 2008Date of Patent: May 1, 2012Assignee: Samsung Electronics Co., Ltd.Inventor: Seung-won Lee
-
Patent number: 8171266Abstract: A method for look-ahead load pre-fetching that reduces the effects of instruction stalls caused by high latency instructions. Look-ahead load pre-fetching is accomplished by searching an instruction stream for load memory instructions while the instruction stream is stalled waiting for completion of a previous instruction in the instruction stream. A pre-fetch operation is issued for each load memory instruction found. The pre-fetch operations cause data for the corresponding load memory instructions to be copied to a cache, thereby avoiding long latencies in the subsequent execution of the load memory instructions.Type: GrantFiled: August 2, 2001Date of Patent: May 1, 2012Assignee: Hewlett-Packard Development Company, L.P.Inventors: Alan H. Karp, Rajiv Gupta
-
Patent number: 8171259Abstract: A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.Type: GrantFiled: February 27, 2009Date of Patent: May 1, 2012Assignee: Fujitsu Semiconductor LimitedInventors: Takashi Hanai, Shinichi Sutou
-
Patent number: 8171263Abstract: A parallel data processing apparatus using a SIMD array of processing elements is disclosed. The apparatus makes use of a register in order to control issuance of instructions to the processing elements in the array.Type: GrantFiled: June 29, 2007Date of Patent: May 1, 2012Assignee: Rambus Inc.Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Publication number: 20120096245Abstract: A computing device includes a receiving unit that receives control information indicating an instruction to be executed on a process that is distributed or an instruction contained in the process that is distributed, from a control information creating device that transmits the control information to each computing device on a network. The computing device further includes a processor configured to suspend execution of an instruction when the instruction to be executed on the process occurs or the instruction contained in the process that is distributed is executed, and execute the suspended instruction when the suspended instruction is associated with the instruction indicated by the control information that is received by the receiving unit.Type: ApplicationFiled: December 29, 2011Publication date: April 19, 2012Applicant: Fujitsu LimitedInventor: Yuta HIGUCHI
-
Patent number: 8161272Abstract: The memory unit is compatible with a plurality of operation modes. The plurality of operation modes include the normal mode allowing access and the standby mode consuming a lower power than the normal mode. The branch detection section detects a branch instruction from an instruction fetched from the memory unit by the CPU. The mode control section changes an operation mode of the memory unit according to a detection result by the branch detection section.Type: GrantFiled: December 23, 2008Date of Patent: April 17, 2012Assignee: Renesas Electronics CorporationInventor: Kiminari Yamazoe
-
Patent number: 8161273Abstract: Embodiments of the present invention provide a system that allocates registers in a processor. The system starts by commencing a transaction, wherein commencing the transaction involves preserving a pre-transactional state of registers in a first register file. The system then allocates one or more registers for temporary use during the transaction. Upon finishing using each allocated register during the transaction, the system executes an instruction that restores the allocated register to the pre-transactional state.Type: GrantFiled: February 26, 2008Date of Patent: April 17, 2012Assignee: Oracle America, Inc.Inventor: Paul Caprioli
-
Patent number: 8161271Abstract: Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized.Type: GrantFiled: July 11, 2007Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventors: David Arnold Luick, Eric Oliver Mejdrich, Adam James Muff
-
Patent number: 8161270Abstract: A programmable processor configured to perform one or more packet modifications through execution of one or more commands. A pipelined processor core comprises a first stage configured to selectively shift and mask data in each of a plurality of categories in response to one or more decoded commands, and combine the selectively shifted and masked data in each of the categories. The pipelined processor core further comprises a second stage configured to selectively perform one or more operations on the combined data from the first stage and other data responsive to the one or more decoded commands. In one implementation, the processor is implemented as an application specific integrated circuit (ASIC).Type: GrantFiled: March 30, 2004Date of Patent: April 17, 2012Assignee: Extreme Networks, Inc.Inventors: David K. Parker, Erik R. Swenson, Christopher J. Young
-
Patent number: 8161480Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.Type: GrantFiled: May 29, 2007Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventors: Charles J. Archer, Gabor Dozsa, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8156314Abstract: A system and method are described that manage incremental state updates in such a way that multiple threads within a processor can each operate, in effect, on their own set of state data. The system and method are applicable to any processor in which multiple threads require access to sets of state information which differ from one another by a relatively small number of state changes.Type: GrantFiled: October 25, 2007Date of Patent: April 10, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Mark M. Leather, Brian D. Emberling
-
Patent number: 8156261Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: March 1, 2011Date of Patent: April 10, 2012Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 8151091Abstract: A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage.Type: GrantFiled: May 21, 2009Date of Patent: April 3, 2012Assignee: International Business Machines CorporationInventors: Xiao Tao Chang, Qiang Liui
-
Patent number: 8131982Abstract: A method for branch prediction, the method comprising, receiving a load instruction including a first data location in a first memory area, retrieving data including a branch address and a target address from the first data location, and saving the data in a branch prediction memory, or receiving an unload instruction including the first data location in the first memory area, retrieving data including a branch address and a target address from the branch prediction memory, and saving the data in the first data location.Type: GrantFiled: June 13, 2008Date of Patent: March 6, 2012Assignee: International Business Machines CorporationInventors: Philip G. Emma, Allan M. Hartstein, Keith N. Langston, Brian R. Prasky, Thomas R. Puzak, Charles F. Webb
-
Patent number: 8112612Abstract: A processing system comprising processors and the dynamically configurable communication elements coupled together in an interspersed arrangement. The processors each comprise at least one arithmetic logic unit, an instruction processing unit, and a plurality of processor ports. The dynamically configurable communication elements each comprise a plurality of communication ports, a first memory, and a routing engine. For each of the processors, the plurality of processor ports is configured for coupling to a first subset of the plurality of dynamically configurable communication elements. For each of the dynamically configurable communication elements, the plurality of communication ports comprises a first subset of communication ports configured for coupling to a subset of the plurality of processors and a second subset of communication ports configured for coupling to a second subset of the plurality of dynamically configurable communication elements.Type: GrantFiled: May 17, 2010Date of Patent: February 7, 2012Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Publication number: 20120030452Abstract: The present disclosure includes methods, devices, modules, and systems for modifying commands. One device embodiment includes a memory controller including a channel, wherein the channel includes a command queue configured to hold commands, and circuitry configured to modify at least a number of commands in the queue and execute the modified commands.Type: ApplicationFiled: October 11, 2011Publication date: February 2, 2012Applicant: MICRON TECHNOLOGY, INC.Inventor: Mehdi Asnaashari
-
Patent number: 8108661Abstract: Provided are a data processing apparatus and a method of controlling the data processing apparatus. The data processing apparatus may select a single stream processor from a plurality of stream processors based on stream processor status information, and input data into the selected stream processor. The stream processor status information may include first status information of a processor core and second status information of at least one internal memory.Type: GrantFiled: April 2, 2009Date of Patent: January 31, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Won Jong Lee, Chan Min Park, Shi Hwa Lee
-
Patent number: 8108838Abstract: A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.Type: GrantFiled: May 15, 2008Date of Patent: January 31, 2012Assignee: International Business Machines CorporationInventors: Sameh W. Asaad, Richard Gerard Hofmann
-
Patent number: 8108660Abstract: Each of processors has a barrier write register and a barrier read register. Each barrier write register is wired to each barrier read register by a dedicated wiring block. For example, a 1-bit barrier write register of a processor is connected, via the wiring block, to a first bit of each 8-bit barrier read register contained in the processors, and a 1-bit barrier write register of another processor is connected, via a wiring block, to a second bit of each 8-bit barrier read register contained in the processors. For example, a processor writes information to its own barrier write register, thereby notifying synchronization stand-by to the other processors and reads its own barrier read register, thereby recognizing whether the other processors are in synchronization stand-by or not. Therefore, a special dedicated instruction is not required along barrier synchronization processing, and the processing can be made at a high speed.Type: GrantFiled: January 22, 2009Date of Patent: January 31, 2012Assignees: Renesas Electronics Corporation, Waseda UniversityInventors: Hironori Kasahara, Keiji Kimura, Masayuki Ito, Tatsuya Kamei, Toshihiro Hattori
-
Patent number: 8108659Abstract: Thread synchronization techniques are used to control access to a memory resource (e.g., a counter) that is shared among multiple threads. Each thread has a unique identifier and threads are assigned to instances of the shared resource so that at least one instance is shared by two or more threads. Each thread assigned to a particular instance of the shared resource has a unique ordering index. A thread is allowed to access its assigned instance of the resource at a point in the program code determined by its ordering index. The threads are advantageously synchronized (explicitly or implicitly) so that no more than one thread attempts to access the same instance of the resource at a given time.Type: GrantFiled: September 19, 2007Date of Patent: January 31, 2012Assignee: NVIDIA CorporationInventor: Scott M. Le Grand
-
Patent number: 8108658Abstract: A data processing circuit comprises a register file (14) having read ports and write ports. A plurality of functional units (21a-c), is coupled to receive operand data from a same combination of read ports. Each functional unit is coupled to a respective one of the write ports for writing a respective result. An instruction issue slot has outputs (11) for supplying register selection information to said combination read ports and to the respective ones of the write ports. The output of the issue slot also supplies an operation code. The functional units (21a-c) in the plurality are arranged to respond to at least to one value of the operation code by each executing a respective operation using the same operands from said same combination and each functional unit producing a respective result at a respective ones of the write ports.Type: GrantFiled: September 21, 2005Date of Patent: January 31, 2012Assignee: Koninklijke Philips Electronics N.V.Inventor: Antonius Adrianus Maria Van Wel
-
Publication number: 20120023313Abstract: An electronic circuit (4000) includes a bias value generator circuit (3900) operable to supply a varying bias value in a programmable range, and an instruction circuit (3625, 4010) responsive to a first instruction to program the range of said bias value generator circuit (3900) and further responsive to a second instruction having an operand to repeatedly issue said second instruction with said operand varied in an operand value range determined as a function of the varying bias value.Type: ApplicationFiled: September 28, 2011Publication date: January 26, 2012Applicant: TEXAS INSTRUMENTS INCORPORATEDInventors: Kenichi TASHIRO, Hiroyuki MIZUNO, Yuji UMEMOTO
-
Patent number: 8103859Abstract: According to an aspect of the embodiment, when data on a cache RAM is rewritten in a storage processing of one thread, an determination unit searches a fetch port which holds a request of another thread, checks whether a request exists whose processing is completed, whose instruction is a load type instruction, and whose target address corresponds to a target address in a storage processing. When the corresponding request is detected, the determination unit sets a re-execution request flag to all the entries of the fetch port from the next entry of the entry which holds the oldest request to the entry which holds the detected request. When the processing of the oldest request is executed, a re-execution request unit transfers a re-execution request of an instruction to an instruction control unit for the request held in the entry in which the re-execution request flag is set.Type: GrantFiled: December 17, 2009Date of Patent: January 24, 2012Assignee: Fujitsu LimitedInventor: Naohiro Kiyota
-
Patent number: 8103853Abstract: A chip having an intelligent fabric may include a soft application processor, a reconfigurable hardware intelligent processor, a partitioned memory storage, and an interface to an external reconfigurable communication processor. The reconfigurable hardware intelligent processor may be configured to implement a distributed reconfigurable processor, and to provide cognitive control for at least one of allocation, reallocation, and performance monitoring.Type: GrantFiled: March 5, 2008Date of Patent: January 24, 2012Assignee: The Boeing CompanyInventors: Tirumale K. Ramesh, John L. Meier
-
Patent number: 8098655Abstract: A system includes a queue that stores P data units, each data unit including multiple bytes. The system further includes a control unit that shifts, byte by byte, Q data units from the queue during a first system clock cycle, where Q<P, and sends, during the first system clock cycle, the Q data units to a processing device configured to process a maximum of Q data units per system clock cycle.Type: GrantFiled: July 21, 2009Date of Patent: January 17, 2012Assignee: Juniper Networks, Inc.Inventor: Brian Gaudet
-
Publication number: 20120011349Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.Type: ApplicationFiled: September 20, 2011Publication date: January 12, 2012Applicant: Calos Fund Limited Liability CompanyInventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin, Raghunath Rao, DeForest Tovey, Mark Rygh, Jung-Ho Ahn
-
Patent number: 8095743Abstract: Access to a memory area by a first processor that executes a first processor program and a second processor that executes a second processor program is granted to one of the first processor and the second processor at a time. Access to the memory area by the first processor and the second processor are cyclically uniquely allocated (e.g., t?[(ad mod m)=o]) between the first and the second processor by the first and second processor programs.Type: GrantFiled: March 29, 2010Date of Patent: January 10, 2012Assignee: Trident Microsystems (Far East) Ltd.Inventors: Matthias Vierthaler, Carsten Noeske
-
Publication number: 20120005530Abstract: Transactional memory implementations may be extended to include special transaction communicator objects through which concurrent transactions can communicate. Changes by a first transaction to a communicator may be visible to concurrent transactions before the first transaction commits. Although isolation of transactions may be compromised by such communication, the effects of this compromise may be limited by tracking dependencies among transactions, and preventing any transaction from committing unless every transaction whose changes it has observed also commits. For example, mutually dependent or cyclically dependent transactions may commit or abort together. Transactions that do not communicate with each other may remain isolated. The system may provide a communicator-isolating transaction that ensures isolation even for accesses to communicators, which may be implemented using nesting transactions. True (e.g., read-after-write) dependencies, ordering (e.g.Type: ApplicationFiled: June 30, 2010Publication date: January 5, 2012Inventors: Virendra J. Marathe, Victor M. Luchangco
-
Patent number: 8090913Abstract: A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups.Type: GrantFiled: December 20, 2010Date of Patent: January 3, 2012Assignee: Freescale Semiconductor, Inc.Inventors: Perry H. Pelley, III, George P. Hoekstra, Lucio F. Pessoa
-
Patent number: 8090933Abstract: The present invention relates to a method for the unification of PER branch and PER store operations within the same dataflow. The method comprises determining a PER range, the PER range comprising a storage area defined by a designated storage starting area and a designated storage ending area, wherein the storage starting area is designated by a value of the contents of a first control register and the storage ending area is designated by a value of the contents of a second control register. The method also comprises retrieving register field content values that are stored at a plurality of registers, wherein the retrieved content values comprises a length field content value, and setting the length field content value to zero for a PER branch instruction, thereby enabling a PER branch instruction to performed similarly to a PER storage instruction.Type: GrantFiled: February 12, 2008Date of Patent: January 3, 2012Assignee: International Business Machines CorporationInventors: Fadi Y. Busaba, Bruce C. Giamei