Patents Examined by Eric Coleman
-
Patent number: 10838725Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.Type: GrantFiled: September 26, 2018Date of Patent: November 17, 2020Assignee: Apple Inc.Inventors: Andrew M. Havlir, Jeffrey T. Brady
-
Patent number: 10824434Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.Type: GrantFiled: November 29, 2018Date of Patent: November 3, 2020Assignee: XILINX, INC.Inventors: Sean Settle, Ehsan Ghasemi, Ashish Sirasao, Ralph D. Wittig
-
Patent number: 10824426Abstract: Embodiments of the present invention are directed to a computer-implemented method for generating and verifying hardware instruction traces including memory data contents. The method includes initiating an in-memory trace (IMT) data capture for a processor, the IMT data being an instruction trace collected while instructions flow through an execution pipeline of the processor. The method further includes capturing contents of architected registers of the processor by: storing the contents of the architected registers to a predetermined memory location, and causing a load-store unit (LSU) to read contents of the predetermined memory location.Type: GrantFiled: April 12, 2019Date of Patent: November 3, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jane H. Bartik, Christian Jacobi, David Lee, Jang-Soo Lee, Anthony Saporito, Christian Zoellin
-
Patent number: 10817299Abstract: A data processing apparatus is provided that includes a plurality of control flow execution circuits to simultaneously execute a first control flow instruction having a first type and a second control flow instruction having a second type from a plurality of instructions. A control flow prediction update circuit updates at most one of: a prediction of the first control flow instruction based on a result of the first control flow instruction, and a prediction of the second control flow instruction based on a result of the second control flow instruction.Type: GrantFiled: September 7, 2018Date of Patent: October 27, 2020Assignee: Arm LimitedInventors: Yasuo Ishii, Chris Abernathy
-
Patent number: 10810156Abstract: An apparatus includes multiple parallel computing cores, where each computing core is configured to perform one or more processing operations and generate input data. The apparatus also includes multiple sets of parallel coprocessors, where each computing core is associated with a different one of the sets of parallel coprocessors. The coprocessors in each set of parallel coprocessors are configured to process the input data and generate output data. Each of the computing cores is configured to generate additional input data based on the output data generated by the associated set of parallel coprocessors.Type: GrantFiled: September 21, 2018Date of Patent: October 20, 2020Assignee: Goldman Sachs & Co. LLCInventors: Paul Burchard, Ulrich Drepper
-
Patent number: 10802830Abstract: A computer data processing system includes a plurality of logical registers, each including multiple storage sections. A processor writes data a storage section based on a dispatched first instruction, and sets a valid bit corresponding to the storage section that receives the data. In response to each subsequent instruction, the processor sets an evictor valid bit indicating a subsequent instruction has written new data to a storage section written by the first instruction, and updates the valid bit to indicate the storage section containing the new written data. A register combination unit generates a combined evictor tag to identify a most recent subsequent instruction. The processor determines the most recent subsequent instruction based on the combined evictor tag in response to a flush event, and unsets all the evictor tag valid bits set by the most the most recent subsequent instruction along with all previous subsequent instructions.Type: GrantFiled: March 5, 2019Date of Patent: October 13, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan Hsieh, Gregory William Alexander, Tu-An Nguyen
-
Patent number: 10795712Abstract: A method for processing virtualization of computers that are part of a group into virtual computers is provided. The method includes obtaining relationship data from the computers, where the relationship data identifies parameters used to communicate within the group. Then, the method analyzes utilization parameters for each of the computers of the group. A visual model for proposed virtualization of the group of computers is then generated. The visual model identifies hosting machines designated to define a virtual computer for each of the computers, where the visual model provides a graphical illustration of the group of computers once converted to virtual computers. The method enables adjustment of the proposed virtualization of the group of computers. Then, an execution sequence of virtualization operations to be carried out is generated, if execution of the proposed virtualization is triggered, and the execution sequence is saved to storage and accessed upon execution.Type: GrantFiled: May 6, 2018Date of Patent: October 6, 2020Assignee: VMware, Inc.Inventor: Abhinav Katiyar
-
Patent number: 10782972Abstract: An apparatus comprises processing circuitry (4) and an instruction decoder (6) which supports vector instructions for which multiple lanes of processing are performed on respective data elements of a vector value. In response to a vector predication instruction, the instruction decoder (6) controls the processing circuitry (4) to set control information based on the outcome of a number of element comparison operations each for determining whether a corresponding element passes or fails a test condition. The control information controls processing of a predetermined number of subsequent vector instructions after the vector predication instruction. The predetermined number is hard-wired or identified by the vector predication instruction. For one of the subsequent vector instructions, an operation for a given portion of a given lane of vector processing is masked based on the outcome indicated by the control information for a corresponding data element.Type: GrantFiled: March 17, 2017Date of Patent: September 22, 2020Assignee: ARM LimitedInventor: Thomas Christopher Grocutt
-
Patent number: 10782980Abstract: Examples of the present disclosure provide apparatuses and methods related to generating and executing a control flow. An example apparatus can include a first device configured to generate control flow instructions, and a second device including an array of memory cells, an execution unit to execute the control flow instructions, and a controller configured to control an execution of the control flow instructions on data stored in the array.Type: GrantFiled: August 24, 2018Date of Patent: September 22, 2020Assignee: Micron Technology, Inc.Inventors: Kyle B. Wheeler, Richard C. Murphy, Troy A. Manning, Dean A. Klein
-
Patent number: 10776207Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.Type: GrantFiled: September 6, 2018Date of Patent: September 15, 2020Assignee: International Business Machines CorporationInventors: Robert F. Enenkel, Christopher Anand, Lucas Dutton, Adele Olejarz
-
Patent number: 10776126Abstract: An apparatus includes a scheduler circuit and a processing circuit. The scheduler circuit may be configured to (i) parse a directed acyclic graph into one or more operators and (ii) schedule the one or more operators in one or more data paths. The processing circuit generally comprises one or more hardware engines configured as the one or more data paths. The one or more hardware engines are generally configured to generate one or more output vectors in response to zero or more input vectors using the operators. At least one of the one or more hardware engines may support input vector dimensions ranging from zero to at least four dimensions. At least one of the one or more hardware engines is implemented solely in hardware.Type: GrantFiled: April 29, 2019Date of Patent: September 15, 2020Assignee: Ambarella International LPInventors: Leslie D. Kohn, Robert C. Kunz
-
Patent number: 10768989Abstract: Methods and apparatus to provide virtualized vector processing are described. In one embodiment, one or more operations corresponding to a virtual vector request are distributed to one or more processor cores for execution.Type: GrantFiled: January 16, 2018Date of Patent: September 8, 2020Assignee: Intel CorporationInventors: Anthony Nguyen, Engin Ipek, Victor Lee, Daehyun Kim, Mikhail Smelyanskiy
-
Patent number: 10768937Abstract: Overhead associated with verifying function return addresses to protect against security exploits is reduced by taking advantage of branch prediction mechanisms for predicting return addresses. More specifically, returning from a function includes popping a return address from a data stack. Well-known security exploits overwrite the return address on the data stack to hijack control flow. In some processors, a separate data structure referred to as a control stack is used to verify the data stack. When a return instruction is executed, the processor issues an exception if the return addresses on the control stack and the data stack are not identical. This overhead can be avoided by taking advantage of the return address stack, which is a data structure used by the branch predictor to predict return addresses. In most situations, if this prediction is correct, the above check does not need to occur, thus reducing the associated overhead.Type: GrantFiled: July 26, 2018Date of Patent: September 8, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Marius Evers, David A. Kaplan, Debjit Das Sarma
-
Patent number: 10761847Abstract: An apparatus in a configurable logic unit may include a configurable logic unit (CLU) configured to receive first and second operands and to perform an operand operation and generate an operation value. The apparatus may also include: a random value generator for generating a random value; an adder coupled to the CLU and the random value generator and configured to generate a sum of the operation value and the random value; and a shift register coupled to the adder and configured to shift the sum by a number of bits to generate shifted data at an output. The random value generator may be a linear feedback shift register. The output may be coupled to an additional CLU so that the shifted data may be used for subsequent operand operations. The apparatus may be implemented in a digital signal processor slice in a configurable logic block.Type: GrantFiled: August 17, 2018Date of Patent: September 1, 2020Assignee: Micron Technology, Inc.Inventor: David Hulton
-
Patent number: 10761855Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.Type: GrantFiled: July 6, 2018Date of Patent: September 1, 2020Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach
-
Patent number: 10754654Abstract: An apparatus that includes a program controller to fetch and issue instructions is described. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lane s of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array.Type: GrantFiled: March 28, 2019Date of Patent: August 25, 2020Assignee: Google LLCInventors: Albert Meixner, Jason Rupert Redgrave, Ofer Shacham, Daniel Frederic Finchelstein, Qiuling Zhu
-
Patent number: 10747691Abstract: Examples provide a memory device, a dual inline memory module, a storage device, an apparatus for storing, a method for storing, a computer program, a machine readable storage, and a machine readable medium. A memory device is configured to store data and comprises one or more interfaces configured to receive and to provide data. The memory device further comprises a memory module configured to store the data, and a memory logic component configured to control the one or more interfaces and the memory module. The memory logic component is further configured to receive information on a specific memory region with one or more model identifications, to receive information on an instruction to perform an acceleration function for one or more certain model identifications, and to perform the acceleration function on data in a specific memory region with the one or more certain model identifications.Type: GrantFiled: April 10, 2018Date of Patent: August 18, 2020Assignee: Intel CorporationInventors: Mark Schmisseur, Thomas Willhalm, Francesc Guim Bernat, Karthik Kumar
-
Patent number: 10740281Abstract: A method is described that entails operating enabled cores of a multi-core processor such that both cores support respective software routines with a same instruction set, a first core being higher performance and consuming more power than a second core under a same set of applied supply voltage and operating frequency.Type: GrantFiled: August 14, 2018Date of Patent: August 11, 2020Assignee: INTEL CORPORATIONInventors: Varghese George, Sanjeev S. Jahagirdar, Deborah T. Marr
-
Patent number: 10725954Abstract: Embodiments of the present invention are directed to a microcontroller device having a microprocessor, programmable memory components, and programmable analog and digital blocks. The programmable analog and digital blocks are configurable based on programming information stored in the memory components. Programmable interconnect logic, also programmable from the memory components, is used to couple the programmable analog and digital blocks as needed. The advanced microcontroller design also includes programmable input/output blocks for coupling selected signals to external pins. The memory components also include user programs that the embedded microprocessor executes. These programs may include instructions for programming the digital and analog blocks “on-the-fly,” e.g., dynamically. In one implementation, there are a plurality of programmable digital blocks and a plurality of programmable analog blocks.Type: GrantFiled: June 1, 2018Date of Patent: July 28, 2020Assignee: Monterey Research, LLCInventors: Warren S. Snyder, Monte Mar
-
Patent number: 10719329Abstract: An apparatus and method are provided for using predicted result values. The apparatus has a processing unit that comprises processing circuitry for executing a sequence of instructions, and value prediction circuitry for identifying a predicted result value for at least one instruction. A result producing structure is provided that is responsive to a request issued from the processing unit when the processing circuitry is executing a first instruction, to produce a result value for the first instruction and return that result value to the processing unit. While waiting for the result value from the result producing structure, the processing circuitry can be arranged to speculatively execute at least one dependent instruction using a predicted result value for the first instruction as obtained from the value prediction circuitry.Type: GrantFiled: June 28, 2018Date of Patent: July 21, 2020Assignee: Arm LimitedInventors: Vladimir Vasekin, David Michael Bull, Chiloda Ashan Senarath Pathirane, Alexei Fedorov