Patents Examined by Daniel H. Pan
-
Patent number: 10235170Abstract: An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.Type: GrantFiled: September 30, 2016Date of Patent: March 19, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Reid T. Copeland, Silvia Melitta Mueller
-
Patent number: 10228947Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.Type: GrantFiled: December 15, 2017Date of Patent: March 12, 2019Assignee: Google LLCInventors: Dong Hyuk Woo, Andrew Everett Phelps
-
Patent number: 10228956Abstract: In one implementation, a processing device is provided that includes a memory to store instructions and a processor core to execute the instructions. The processor core is to receive a sequence of instructions reordered by a binary translator for execution. A first load of the sequence of instructions is identified. The first load references a memory location that stores a data item to be loaded. An occurrence of a second load is detected. The second load to access the memory location subsequent to an execution of the first load instruction. A protection field in the first load is enabled based on the detected occurrence of the second load. The enabled protection field indicates that the first load is to be checked for an aliasing associated with the memory location with respect to a subsequent store instruction. The second load is eliminated based on the enabled of the protection field.Type: GrantFiled: September 30, 2016Date of Patent: March 12, 2019Assignee: Intel CorporationInventors: Vineeth Mekkat, Mark J. Dechene, Zhongying Zhang, Jason Agron, Sebastian Winkel
-
Patent number: 10229266Abstract: Corruption of call stacks is detected by using guard words placed in the call stacks. A store guard word instruction is used to store a guard word on a stack frame of a caller routine, and a verify guard word instruction issued by one or more callee routines is used to verify the guard word is an expected value. If the guard word is an unexpected value, corruption is indicated.Type: GrantFiled: February 17, 2017Date of Patent: March 12, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 10223110Abstract: There is a need to provide a central processing unit capable of improving the resistance to power analysis attack without changing programs, lowering clock frequencies, and greatly redesigning a central processing unit of the related art. In a central processing unit, an arithmetic unit is capable of performing arithmetic operation using data irrelevant to data stored in a register group. A control unit allows the arithmetic unit to perform arithmetic processing corresponding to an incorporated instruction. At this time, the control unit allows the arithmetic unit to perform arithmetic processing using the irrelevant data during a first one-clock cycle.Type: GrantFiled: August 29, 2013Date of Patent: March 5, 2019Assignee: Renesas Electronics CorporationInventor: Minoru Saeki
-
Patent number: 10223121Abstract: A processor includes a decoder, a data return buffer, and an execution unit. The decoder is to decode an instruction for a non-posted load into a decoded instruction for loading data from memory mapped input/output. The execution unit is for executing the decoded instruction. The execution is to start a timer, determine whether the timer exceeds a timeout threshold, allocate an entry in the data return buffer for the load, and determine whether an event arrived. The timer is to measure an amount of time taken to return the non-posted load instruction. The determination whether an event arrived is made in response to at least one of the allocation of the entry for the load, or a determination that the timer exceeds the timeout threshold.Type: GrantFiled: December 22, 2016Date of Patent: March 5, 2019Assignee: Intel CorporationInventors: Ido Ouziel, Raanan Sade, Jacob Doweck
-
Patent number: 10216246Abstract: In an embodiment, a processor includes processing cores, and a central control unit to: concurrently execute an outer control loop and an inner control loop, wherein the outer control loop is to monitor the processor as a whole, and wherein the inner control loop is to monitor a first processing core included in the processor; determine, based on the outer control loop, a first control action for the first processing core included in the processor; determine, based on the inner control loop, a second control action for the first processing core included in the processor; based on a comparison of the first control action and the second control action, select one of the first control action and the second control action as a selected control action; and apply the selected control action to the first processing core. Other embodiments are described and claimed.Type: GrantFiled: September 30, 2016Date of Patent: February 26, 2019Assignee: Intel CorporationInventors: Doron Rajwan, Efraim Rotem, Eliezer Weissmann, Avinash N. Ananthakrishnan, Dorit Shapira
-
Patent number: 10216635Abstract: Techniques relate to handling outstanding cache miss prefetches. A processor pipeline recognizes that a prefetch canceling instruction is being executed. In response to recognizing that the prefetch canceling instruction is being executed, all outstanding prefetches are evaluated according to a criterion as set forth by the prefetch canceling instruction in order to select qualified prefetches. In response to evaluating, a cache subsystem is communicated with to cause canceling of the qualified prefetches that fit the criterion. In response to successful canceling of the qualified prefetches, a local cache is prevented from being updated from the qualified prefetches.Type: GrantFiled: December 22, 2016Date of Patent: February 26, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Karl Gschwind, Maged M. Michael, Valentina Salapura, Eric M. Schwarz, Chung-Lung K. Shum
-
Patent number: 10216516Abstract: A processing device includes a store instruction identification unit to identify a pair of store instructions among a plurality of instructions in an instruction queue. The pair of store instructions include a first store instruction and a second store instruction. The first data of the first store instruction corresponds to a first memory region adjacent to a second memory region, and a second data of the second store instruction corresponds to the second memory region. The processing device to include a store instruction fusion unit to fuse the first store instruction with the second store instruction resulting in a fused store instruction.Type: GrantFiled: September 30, 2016Date of Patent: February 26, 2019Assignee: Intel CorporationInventors: Sebastian Winkel, Jamison D. Collins, Tyler Sondag
-
Patent number: 10203958Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.Type: GrantFiled: June 28, 2017Date of Patent: February 12, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Joseph Zbiciak, Timothy D. Anderson
-
Patent number: 10203957Abstract: A register alias table for a processor including an alias queue, load and store comparators, and dependency logic. Each entry of the alias queue stores instruction pointers of a pair of colliding load and store instructions that caused a memory violation and a valid value. The store comparator compares the instruction pointer of a subsequent store instruction with those stored in the alias queue, and if a match occurs, indicates that a store index of the subsequent store instruction is valid. The load comparator determines whether the instruction pointer of a subsequent load instruction matches an instruction pointer stored in the alias queue. If so, dependency logic provides a store index, if valid, as dependency information for the subsequent load instruction.Type: GrantFiled: September 30, 2016Date of Patent: February 12, 2019Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Xiaolong Fei
-
Patent number: 10198263Abstract: Apparatus and methods are disclosed for nullifying one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain a register identification of at least one of a plurality of registers, based on a target field of the nullification instruction. A write to the at least one register associated with the register identification is nullified. The nullification instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified write to the at least one register, a subsequent instruction is executed from a second, different instruction block.Type: GrantFiled: March 3, 2016Date of Patent: February 5, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Patent number: 10191749Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.Type: GrantFiled: December 24, 2015Date of Patent: January 29, 2019Assignee: Intel CorporationInventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10185570Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.Type: GrantFiled: November 9, 2017Date of Patent: January 22, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
-
Patent number: 10185562Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can generate a first bitmap comprising a plurality of bits, where the plurality of bits includes a first bit that represents a first memory location. The processor core can determine that the value of the first bit is equal to the value of a second bit in the first bitmap. The processor core can determine the location of the second bit in relation to the first bit in the first bitmap. The processor core can generate a second bitmap including a third bit indicating that the first bit is the last bit in the first bitmap with the same value as the second bit.Type: GrantFiled: December 24, 2015Date of Patent: January 22, 2019Assignee: Intel CorporationInventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10180839Abstract: An apparatus includes a processor and a loop cache coupled to the processor. The loop cache provides to the processor instructions corresponding to a loop in the instructions. The loop cache includes a persistence counter.Type: GrantFiled: March 4, 2016Date of Patent: January 15, 2019Assignee: Silicon Laboratories Inc.Inventors: Mark W. Johnson, Paul Zavalney, Marius Grannæs, Oeivind A. G. Loe
-
Patent number: 10176147Abstract: Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs) and related methods are disclosed. In aspects disclosed herein, ICs are provided that include a central processing unit (CPU) having multiple processor cores (“cores”) to improve performance. To further improve CPU performance, the multiple cores can also be designed to communicate with each other to offload workloads and/or share resources for parallel processing, but at a communication overhead associated with passing data through interconnects which have an associated latency. To mitigate this communication overhead inefficiency, aspects disclosed herein provide the CPU with its multiple cores in a 3DIC.Type: GrantFiled: March 7, 2017Date of Patent: January 8, 2019Assignee: QUALCOMM IncorporatedInventors: Kambiz Samadi, Amin Ansari, Yang Du
-
Patent number: 10176104Abstract: An apparatus comprises processing circuitry, an instruction cache, decoding circuitry to decode program instructions fetched from the cache to generate macro-operations to be processed by the processing circuitry, and predecoding circuitry to perform a predecoding operation on a block of program instructions fetched from a data store to generate predecode information to be stored to the cache with the block of instructions. In one example the predecoding operation comprises generating information on how many macro-operations are to generated by the decoding circuitry for a group of one or more program instructions. In another example the predecoding operation comprises generating information indicating whether at least one of a given subset of program instructions within the prefetched block is a branch instruction.Type: GrantFiled: September 30, 2016Date of Patent: January 8, 2019Assignee: ARM LimitedInventors: Vasu Kudaravalli, Matthew Paul Elwood, Adam George, Muhammad Umar Farooq, Michael Filippo
-
Patent number: 10162641Abstract: This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.Type: GrantFiled: February 10, 2017Date of Patent: December 25, 2018Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy D. Anderson, Joseph Zbiciak, Duc Quang Bui, Abhijeet A. Chachad, Kai Chirca, Naveen Bhoria, Matthew D. Pierson, Daniel Wu, Ramakrishnan Venkatasubramanian
-
Patent number: 10162755Abstract: A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.Type: GrantFiled: October 31, 2016Date of Patent: December 25, 2018Assignee: International Business Machines CorporationInventors: Guy L. Guthrie, Hugh Shen, William J. Starke, Derek E. Williams