Abstract: A computer processor includes execution logic (having a number of functional units) configured to perform operations that access operand data values stored in a plurality of operand storage elements. Such operand data values include a predefined None operand data value indicative of a missing operand value. The operations include a RETIRE operation specifying a number of operand data values that is intended to be retired in a predefined machine cycle. During execution of the RETIRE operation, zero or more at None operand data values are selectively retired in the predefined machine cycle based on the number of operand data values specified by the RETIRE operation and the number of operand data values to be retired as a result of execution of other operations by the execution logic in the predefined machine cycle. Other aspects and software tools are also described and claimed.
Type:
Grant
Filed:
July 13, 2015
Date of Patent:
November 14, 2017
Assignee:
Mill Computing, Inc.
Inventors:
Roger Rawson Godard, Arthur David Kahlich, David Arthur Yost
Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
Type:
Grant
Filed:
December 23, 2011
Date of Patent:
October 24, 2017
Assignee:
INTEL CORPORATION
Inventors:
Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Abstract: A microprocessor circuit may include a software programmable microprocessor core and a data memory accessible via a data memory bus. The data memory may include sets of configuration data structured according to respective predetermined data structure specifications for configurable math hardware accelerators, and sets of input data for configurable math hardware accelerators, each configured to apply a predetermined signal processing function to the set of input data according to received configuration data. A configuration controller is coupled to the data memory via the data memory bus and to the configurable math hardware accelerators. The configuration controller may fetch the configuration data for each math hardware accelerator from the data memory and translate the configuration data.
Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
Type:
Grant
Filed:
November 20, 2012
Date of Patent:
August 8, 2017
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Michael K. Gschwind, Brett Olsson, Valentina Salapura
Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
Type:
Grant
Filed:
September 16, 2011
Date of Patent:
August 8, 2017
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Michael K. Gschwind, Brett Olsson, Valentina Salapura
Abstract: A method and an apparatus that partition a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The total number of threads is based on a multi-dimensional value for a global thread number specified in the API. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to either a dimension for a data parallel task associated with the executable codes or a dimension for a multi-dimensional value for a local thread group number. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
Abstract: A reconfigurable circuit includes a plurality of processing elements and an input/output data interface unit, and the reconfigurable circuit is configured to control connections of the plurality of processing elements for each context. The input/output data interface unit is configured to hold operation input data which is input to the plurality of processing elements and operation output data which is output from the plurality of processing elements. The input/output data interface unit includes a plurality of ports, and a plurality of registers. The registers are configured to be connected to the plurality of ports, and to include m (m being an integer of 2 or more) number of banks in a depth direction.
Abstract: A processor and method are disclosed. In one embodiment the processor includes a prefetch buffer that stores macro instructions. The processor also includes a clock circuit that can provide a clock signal for at least some of the functional units within the processor. The processor additionally includes macro instruction decode logic that can determine a class of each macro instruction. The processor also includes a clock management unit that can cause the clock signal to remain in a steady state entering at least one of the units in the processor that do not operate on a current macro instruction being decoded. Finally, the processor also includes at least one instruction decoder unit that can decode the first macro instruction into one or more opcodes.
Type:
Grant
Filed:
September 24, 2010
Date of Patent:
July 18, 2017
Assignee:
Intel Corporation
Inventors:
Venkateswara R. Madduri, Jonathan Y. Tong, Hoichi Cheong
Abstract: A method for compressing instruction is provided, which includes the following steps. Analyze a program code to be executed by a processor to find one or more instruction groups in the program code according to a preset condition. Each of the instruction groups includes one or more instructions in sequential order. Sort the one or more instruction groups according to a cost function of each of the one or more instruction groups. Put the first X of the sorted one or more instruction groups into an instruction table. X is a value determined according to the cost function. Replace each of the one or more instruction groups in the program code that are put into the instruction table with a corresponding execution-on-instruction-table (EIT) instruction. The EIT instruction has a parameter referring to the corresponding instruction group in the instruction table.
Type:
Grant
Filed:
August 1, 2013
Date of Patent:
June 6, 2017
Assignee:
ANDES TECHNOLOGY CORPORATION
Inventors:
Wei-Hao Chiao, Hong-Men Su, Haw-Luen Tsai
Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
Type:
Grant
Filed:
March 15, 2013
Date of Patent:
June 6, 2017
Assignee:
Intel Corporation
Inventors:
Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
Abstract: A method and a system for job scheduling in application servers. A common metadata of a job is deployed, the job being a deployable software component. An additional metadata of the job is further deployed. A scheduler task based on the additional metadata of the job is created, wherein the task is associated with a starting condition. The scheduler task is started at an occurrence of the starting condition, and, responsive to this an execution of an instance of the job is invoked asynchronously.
Abstract: A method and circuit arrangement for selectively predicating instructions in an instruction stream based upon a predication filter criteria defined by a predication filter, which describes types or patterns of instructions that should be predicated. Predication logic compares a respective instruction of an instruction stream to predication filter criteria to determine whether the respective instruction matches the predication filter criteria, and the respective instruction is selectively predicated based on whether the respective instruction matches the predication filter criteria.
Type:
Grant
Filed:
December 19, 2011
Date of Patent:
April 25, 2017
Assignee:
International Business Machines Corporation
Inventors:
Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Abstract: A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode.
Type:
Grant
Filed:
December 20, 2011
Date of Patent:
April 25, 2017
Assignee:
International Business Machines Corporation
Inventors:
Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Abstract: An adaptive computing engine (ACE) IC includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, controller operations, memory operations, and bit-level manipulations.
Type:
Grant
Filed:
March 13, 2013
Date of Patent:
March 14, 2017
Assignee:
Altera Corporation
Inventors:
Paul L. Master, Stephen J. Smith, John Watson
Abstract: A message is received by a mobile phone via a messaging service provided by a mobile network operator, wherein the messaging service is supported by the mobile phone. It is determined whether the message is associated with a distributed transaction. The message is forwarded to a resource manager resident on the mobile phone if the message is associated with the distributed transaction. The resource manager performs an action upon receiving the message based on contents of the message, wherein the action is associated with the distributed transaction.
Abstract: Embodiments of the invention provide for executing a batch process on a repository of information. According to one embodiment, executing a batch process can comprise presenting one or more aspects of records of the repository and receiving a selection of a criteria for at least one aspect of the records. Records matching the selected criteria can be identified and a summary of the information can be presented. The batch process can comprise one of a plurality of batch processes. In such a case, a selection of the batch process can be received and parameters of the batch process can be populated with the selected criteria. The batch process can then be executed with the parameters. For example, executing the batch process can comprise generating a report based on the parameters and the records of the repository.
Abstract: A method and apparatus for scheduling processing jobs is described. In one embodiment, a scheduler receives a request to process one or more computation jobs. The scheduler generates a size metric corresponding to a size of an executable image of each computation job and a corresponding data set associated with each computation job. The scheduler adjusts a priority of each computation job based on a system configuration setting and schedules the process of each computation job according to the priority of each computation job. In another embodiment, the scheduler distributes the plurality of computation jobs on one or more processors of a computing system, where the system configuration setting prioritizes a computation job with a smaller size metric than a computation job with a larger size metric.
Abstract: An internal processor of a memory device configured to selectively execute instructions in parallel. One such internal processor includes a plurality of arithmetic logic units (ALUs), each connected to conditional masking logic, and each configured to process conditional instructions. A condition instruction may be received by a sequencer of the memory device. Once the condition instruction is received, the sequencer may enable the conditional masking logic of the ALUs. The sequencer may toggle a signal to the conditional masking logic such that the masking logic masks certain instructions if a condition of the condition instruction has been met, and masks other instructions if the condition has not been met. In one embodiment, each ALU in the internal processor may selectively perform instructions in parallel.
Abstract: A split level history buffer in a central processing unit is provided. A history buffer is split into a first portion and a second portion. An instruction fetch unit fetches and tags instructions with unique tags. A register file stores tagged instructions. An execution unit generates results for tagged instructions. A first instruction is fetched, tagged, and stored in an entry of the register file. A second instruction is fetched and tagged, and then evicts the first instruction from the register file, such that the second instruction is stored in the entry of the register file. Subsequently, the first instruction is stored in an entry in the first portion of the history buffer. After a result for the first instruction is generated, the first instruction is moved from the first portion of the history buffer to the second portion of the history buffer.
Type:
Grant
Filed:
April 6, 2016
Date of Patent:
December 20, 2016
Assignee:
International Business Machines Corporation
Inventors:
Hung Q. Le, Dung Q. Nguyen, David R. Terry
Abstract: A virtual machine configuration system, comprising a virtualizer for, in a virtualization environment in which a plurality of physical resources connected mutually through a network circuit has been arranged on a computer system sectioned into a plurality of partitions, dynamically changing a physical resource configuration and a virtual machine configuration while simultaneously controlling a configuration of the physical resources of the partition and a configuration of virtual resources allotted to virtual machines without exerting an influence over an application service operating on the virtual machine.