Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.
Type:
Grant
Filed:
August 29, 2019
Date of Patent:
September 28, 2021
Assignee:
International Business Machines Corporation
Inventors:
Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.
Abstract: A method includes incrementing a counter with transmission of a process data from a first processor to a second processor, periodically decrementing the counter, if the counter is greater than a predetermined floor threshold value, wherein a period is a predetermined time interval; and stalling the first processor, if the counter is above a configurable load threshold value, so as to re-schedule the transmission of the process data from the first processor to the second processor.
Abstract: Systems and methods for a workload optimized server for intelligent algorithm trading platforms. In an illustrative, non-limiting embodiment, an Information Handling System (IHS) may include a plurality of Central Processing Units (CPUs) and a control circuit coupled to the plurality of CPUs, the control circuit having a memory configured to store program instructions that, upon execution by the control logic, cause the IHS to: set a first number of enabled cores in a first CPU to operate with a first all-core turbo frequency, and set a second number of enabled cores in a second CPU to operate with a second all-core turbo frequency, where the first number of enabled cores is different from the second number of enabled cores, and where at least one of the first or second all core turbo frequencies is selected to cause the IHS to operate with reduced execution jitter.