Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for recovering from bad store-to-load forwarding in an out-of-order processor

Patent number: 9996356

Abstract: Apparatus and method for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.

Type: Grant

Filed: December 26, 2015

Date of Patent: June 12, 2018

Assignee: Intel Corporation

Inventors: Vineeth Mekkat, Oleg Margulis, Jason M. Agron, Ethan Schuchman, Sebastian Winkel, Youfeng Wu, Gisle Dankel
Technologies for position-independent persistent memory pointers

Patent number: 9971703

Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.

Type: Grant

Filed: August 9, 2017

Date of Patent: May 15, 2018

Assignee: Intel Corporation

Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
Technologies for persistent memory programming

Patent number: 9940229

Abstract: Technologies for persistent memory programming include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may assign a virtual memory address of a target location in persistent memory to a persistent memory pointer using persistent pointer strategy, and may dereference the pointer using the same strategy. Persistent pointer strategies include off-holder, ID-in-value, optimistic rectification, and pessimistic rectification. The computing device may log changes to persistent memory during the execution of a data consistency section, and commit changes to the persistent memory when the last data consistency section ends. Data consistency sections may be grouped by log group identifier. Using type metadata stored in the nonvolatile region, the computing device may identify the type of a root object within the nonvolatile region and then recursively identify the type of all objects referenced by the root object. Other embodiments are described and claimed.

Type: Grant

Filed: September 25, 2014

Date of Patent: April 10, 2018

Assignee: Intel Corporation

Inventors: Xipeng Shen, Youfeng Wu, Cheng Wang, Hyunchul Park, Hongbo Rong
Instruction and Logic for Dynamic Store Elimination

Publication number: 20180074827

Abstract: A processor for redundant stores includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, a binary translator, and a memory execution unit. The binary translator includes circuitry to identify a first region of the instruction stream including a redundant store, mark a first starting instruction of the first region with a protection designator, mark a first ending instruction of the first region with a clear designator, and store an amended instruction stream with the markings. The memory execution unit includes circuitry to track the first redundant store based on the protection designator and the clear designator to eliminate the first redundant store.

Type: Application

Filed: September 14, 2016

Publication date: March 15, 2018

Inventors: Vineeth Mekkat, Youfeng Wu, Sebastian Winkel, Oleg Margulis
Method and apparatus for approximating detection of overlaps between memory ranges

Patent number: 9910650

Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.

Type: Grant

Filed: September 25, 2014

Date of Patent: March 6, 2018

Assignee: Intel Corporation

Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
SYSTEMS, APPARATUSES, AND METHODS FOR A HARDWARE AND SOFTWARE SYSTEM TO AUTOMATICALLY DECOMPOSE A PROGRAM TO MULTIPLE PARALLEL THREADS

Publication number: 20180060049

Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.

Type: Application

Filed: June 6, 2017

Publication date: March 1, 2018

Inventors: DAVID J. SAGER, RUCHIRA SASANKA, RON GABOR, SHLOMO RAIKIN, JOSEPH NUZMAN, LEEOR PELED, JASON A. DOMER, HO-SEOP KIM, YOUFENG WU, KOICHI YAMADA, TIN-FOOK NGAI, HOWARD H. CHEN, JAYARAM BOBBA, JEFFREY J. COOK, OMAR M. SHAIKH, SURESH SRINIVAS
METHOD AND APPARATUS FOR SPECULATIVE VECTORIZATION

Publication number: 20180018177

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

Type: Application

Filed: July 18, 2017

Publication date: January 18, 2018

Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
REGISTER RECLAMATION

Publication number: 20180004524

Abstract: In an example, there is disclosed an apparatus, including a binary translator (BT) including circuitry to: analyze a code block; determine that an architectural register mapped to a physical register in the physical register file is available for early reclamation; and insert a reclamation hint into the code block. There is also disclosed a processor to reclaim the physical register based at least in part on the reclamation hint.

Type: Application

Filed: July 2, 2016

Publication date: January 4, 2018

Applicant: Intel Corporation

Inventors: Vineeth Mekkat, Janghaeng Lee, Youfeng Wu
Instruction and Logic for Total Store Elimination

Publication number: 20170351516

Abstract: A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.

Type: Application

Filed: June 7, 2016

Publication date: December 7, 2017

Inventors: Vineeth Mekkat, Oleg Margulis, Ching-Tsun Chou, Youfeng Wu
Flexible acceleration of code execution

Patent number: 9836316

Abstract: Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode.

Type: Grant

Filed: September 28, 2012

Date of Patent: December 5, 2017

Assignee: Intel Corporation

Inventors: Cheng Wang, Youfeng Wu
Methods and apparatus to manage concurrent predicate expressions

Patent number: 9830196

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to manage concurrent predicate expressions. An example method discloses inserting a first condition hook into a first thread, the first condition hook associated with a first condition, inserting a second condition hook into a second thread, the second condition hook associated with a second condition, preventing the second thread from executing until the first condition is satisfied, and identifying a concurrency violation when the second condition is satisfied.

Type: Grant

Filed: August 24, 2015

Date of Patent: November 28, 2017

Assignee: INTEL CORPORATION

Inventors: Justin E. Gottschlich, Cristiano Ligieri Pereira, Gilles Pokam, Youfeng Wu
TECHNOLOGIES FOR POSITION-INDEPENDENT PERSISTENT MEMORY POINTERS

Publication number: 20170337137

Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.

Type: Application

Filed: August 9, 2017

Publication date: November 23, 2017

Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region

Patent number: 9817644

Abstract: An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.

Type: Grant

Filed: September 28, 2015

Date of Patent: November 14, 2017

Assignee: Intel Corporation

Inventors: Mauricio Breternitz, Jr., Youfeng Wu, Cheng Wang, Edson Borin, Shiliang Hu, Craig B. Zilles
Technologies for position-independent persistent memory pointers

Patent number: 9767037

Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.

Type: Grant

Filed: June 26, 2015

Date of Patent: September 19, 2017

Assignee: Intel Corporation

Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
Automatic loop vectorization using hardware transactional memory

Patent number: 9720667

Abstract: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.

Type: Grant

Filed: March 21, 2014

Date of Patent: August 1, 2017

Assignee: Intel Corporation

Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Nalini Vasudevan, Cheng Wang
Instruction and logic to monitor loop trip count and remove loop optimizations

Patent number: 9715388

Abstract: Logic and instruction to monitor loop trip count are disclosed. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop.

Type: Grant

Filed: March 30, 2012

Date of Patent: July 25, 2017

Assignee: Intel Corporation

Inventors: Jaewoong Chung, Hyunchul Park, Hongbo Rong, Cheng Wang, Youfeng Wu
Energy/performance with optimal communication in dynamic parallelization of single threaded programs

Patent number: 9715376

Abstract: A method and apparatus for optimizing parallelized single threaded programs is herein described. Code regions, such as dependency chains, are replicated utilizing any known method, such as dynamic code replication. A flow network associated with a replicated code region is built and a minimum cut algorithm is applied to determine duplicated nodes, which may include a single instruction or a group of instructions, to be removed. The dependency of removed nodes is fulfilled with inserted communication to ensure proper data consistency of the original single-threaded program. As a result, both performance and power consumption is optimized for parallel code sections through removal of expensive workload nodes and replacement with communication between other replicated code regions to be executed in parallel.

Type: Grant

Filed: December 29, 2008

Date of Patent: July 25, 2017

Assignee: Intel Corporation

Inventors: Cheng Wang, Youfeng Wu
Overlapping atomic regions in a processor

Patent number: 9710280

Abstract: In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed.

Type: Grant

Filed: December 30, 2011

Date of Patent: July 18, 2017

Assignee: Intel Corporation

Inventors: Jaewoong Chung, Cheng Wang, Youfeng Wu
Methods and apparatuses for efficient load processing using buffers

Patent number: 9710391

Abstract: Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.

Type: Grant

Filed: May 3, 2013

Date of Patent: July 18, 2017

Assignee: Intel Corporation

Inventors: Wei Liu, Youfeng Wu, Christopher Wilkerson, Herbert Hum
Method and apparatus for speculative vectorization

Patent number: 9710279

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

Type: Grant

Filed: September 26, 2014

Date of Patent: July 18, 2017

Assignee: Intel Corporation

Inventors: Nalini Vasudevan, Cheng Wang, Youfeng Wu, Albert Hartono, Sara S. Baghsorkhi

prev 1 2 3 4 5 6 … next