Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
Type:
Grant
Filed:
May 3, 2017
Date of Patent:
June 11, 2019
Assignee:
Intel Corporation
Inventors:
Elmoustapha Ould-Ahmed-Vall, Charles R. Yount, Suleyman Sair, Kshitij A. Doshi
Abstract: An arithmetic processor of an embodiment comprises program counter, a program memory, registers, and a decoder. Also the arithmetic processor comprises an arithmetic unit that carries out an operation using the operand and operator acquired from the registers based on a decode result by the decoder, a data memory that stores constant data and an address in association with the data, and a load unit that comprises a load data address storing unit that stores a load data address indicating an address where the constant data is stored; and an increment unit that updates the load data address stored in the load data address storing unit. The load unit loads, from the data memory, constant data corresponding to an address specified by an operand of a load instruction from the decoder, and stores the constant data in a specific one of the registers.
Abstract: An improved technique involves processing a workflow in stages, and processing all requests in a queue for a given stage before moving onto the next stage. Along these lines, each request received by a storage processor is assigned to a core and placed in a first queue for that core. Within that core, a single system thread executes first instructions for a task, e.g., checking the storage cache for the requested data from a request, and then transfers the request to a second queue. Rather than perform additional tasks to completely satisfy the request, however, the thread executes the first instructions for a prespecified number of requests in the first queue. Only when the thread has executed instructions for the prespecified number of requests, the thread begins execution of second instructions for requests in the second queue, and work on the next task begins.
Type:
Grant
Filed:
March 31, 2014
Date of Patent:
March 19, 2019
Assignee:
EMC IP Holding Company LLC
Inventors:
Daniel Cummins, David W. Harvey, Steve Morley
Abstract: A conditional instruction end facility is provided that allows completion of an instruction to be delayed. In executing the machine instruction, an operand is obtained, and a determination is made as to whether the operand has a predetermined relationship with respect to a value. Based on determining that the operand does not have the predetermined relationship with respect to the value, the obtaining and the determining are repeated. Based on determining that the operand has the predetermined relationship with respect to the value, execution of the instruction is completed.
Type:
Grant
Filed:
January 30, 2017
Date of Patent:
March 19, 2019
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Dan F. Greiner, Christian Jacobi, Marcel Mitran, Donald W. Schmidt, Timothy J. Slegel
Abstract: A microprocessor implemented method for maintaining a guest return address stack in an out-of-order microprocessor pipeline is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each function call instruction in the native address space fetched during execution, the method also comprises performing the following: (a) pushing a current entry into a guest return address stack (GRAS) responsive to a function call, wherein the GRAS is maintained at the fetch stage of the pipeline, and wherein the current entry comprises information regarding both a guest target return address and a corresponding native target return address associated with the function call; (b) popping the current entry from the GRAS in response to processing a return instruction; and (c) fetching instructions from the native target return address in the current entry after the popping from the GRAS.
Abstract: Provided is a method for predicting a target address using a set of Indirect Target TAgged GEometric (ITTAGE) tables and a target address pattern table. A branch instruction that is to be executed may be identified. A first tag for the branch instruction may be determined. The first tag may be a unique identifier that corresponds to the branch instruction. Using the tag, the branch instruction may be determined to be in a target address pattern table, and an index may be generated. A predicted target address for the branch instruction may be determined using the generated index and the largest ITTAGE table. Instructions associated with the predicted target address may be fetched.
Type:
Grant
Filed:
December 18, 2017
Date of Patent:
February 19, 2019
Assignee:
International Business Machines Corporation
Inventors:
Satish Kumar Sadasivam, Puneeth A. H. Bhat, Shruti Saxena
Abstract: Embodiments of the present invention relate to fast and conditional data modification and generation in a software-defined network (SDN) processing engine. Modification of multiple inputs and generation of multiple outputs can be performed in parallel. A size of each input or output data can be large, such as in hundreds of bytes. The processing engine includes a control path and a data path. The control path generates instructions for modifying inputs and generating new outputs. The data path executes all instructions produced by the control path. The processing engine is typically programmable such that conditions and rules for data modification and generation can be reconfigured depending on network features and protocols supported by the processing engine. The SDN processing engine allows for processing multiple large-size data flows and is efficient in manipulating such data. The SDN processing engine achieves full throughput with multiple back-to-back input and output data flows.
Type:
Grant
Filed:
December 30, 2013
Date of Patent:
January 30, 2018
Assignee:
CAVIUM, INC.
Inventors:
Anh T. Tran, Gerald Schmidt, Tsahi Daniel, Mohan Balan