Patents by Inventor Yipeng Wang
Yipeng Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200081835Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.Type: ApplicationFiled: September 10, 2018Publication date: March 12, 2020Inventors: REN WANG, RAANAN SADE, YIPENG WANG, Tsung-Yuan TAI, SAMEH GOBRIEL
-
Publication number: 20200042479Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: ApplicationFiled: October 14, 2019Publication date: February 6, 2020Applicant: Intel CorporationInventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
-
Patent number: 10530375Abstract: A frequency divider circuit (200) includes a frequency sub-divider (201) to provide a frequency divided clock, a delay circuit (250) configured to delay the frequency divided clock by N+0.5 cycles of the input clock to generate a delayed clock, and an output circuit (202) configured to generate an output clock based on the frequency divided clock and the delayed clock, where the output clock has a frequency that is equal to 1/(N+0.5) times a frequency of the input clock, and N is an integer greater than one.Type: GrantFiled: September 5, 2018Date of Patent: January 7, 2020Assignee: XILINX, INC.Inventors: Yipeng Wang, Kee Hian Tan, Stanley Y. Chen, Yohan Frans
-
Patent number: 10445271Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: GrantFiled: January 4, 2016Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Ren Wang, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Yipeng Wang, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs, Andrew J. Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson
-
Patent number: 10445118Abstract: Methods, apparatus, systems, and articles of manufacture to facilitate field-programmable gate array support during runtime execution of computer readable instructions are disclosed herein. An example apparatus includes a compiler to, prior to runtime, compile a block of code written as high level source code into a first hardware bitstream kernel and a second hardware bitstream kernel; a kernel selector to select the first hardware bitstream kernel based on an attribute to be dispatched during runtime; a dispatcher to dispatch the first hardware bitstream kernel to a field programmable gate array (FPGA) during runtime; and the kernel selector to, when an FPGA attribute does not satisfy a threshold during runtime, adjust the selection of the first hardware bitstream kernel to the second hardware bitstream kernel to be dispatched during runtime.Type: GrantFiled: September 22, 2017Date of Patent: October 15, 2019Assignee: INTEL CORPORATIONInventors: Xiangyang Guo, Simonjit Dutta, Han Lee, Yipeng Wang
-
Publication number: 20190102346Abstract: A central processing unit can offload table lookup or tree traversal to an offload engine. The offload engine can provide hardware accelerated operations such as instruction queueing, bit masking, hashing functions, data comparisons, a results queue, and a progress tracking. The offload engine can be associated with a last level cache. In the case of a hash table lookup, the offload engine can apply a hashing function to a key to generate a signature, apply a comparator to compare signatures against the generated signature, retrieve a key associated with the signature, and apply the comparator to compare the key against the retrieved key. Accordingly, a data pointer associated with the key can be provided in the result queue. Acceleration of operations in tree traversal and tuple search can also occur.Type: ApplicationFiled: November 30, 2018Publication date: April 4, 2019Inventors: Ren WANG, Andrew J. HERDRICH, Tsung-Yuan C. TAI, Yipeng WANG, Raghu KONDAPALLI, Alexander BACHMUTSKY, Yifan YUAN
-
Publication number: 20190095229Abstract: Methods, apparatus, systems, and articles of manufacture to facilitate field-programmable gate array support during runtime execution of computer readable instructions are disclosed herein. An example apparatus includes a compiler to, prior to runtime, compile a block of code written as high level source code into a first hardware bitstream kernel and a second hardware bitstream kernel; a kernel selector to select the first hardware bitstream kernel based on an attribute to be dispatched during runtime; a dispatcher to dispatch the first hardware bitstream kernel to a field programmable gate array (FPGA) during runtime; and the kernel selector to, when an FPGA attribute does not satisfy a threshold during runtime, adjust the selection of the first hardware bitstream kernel to the second hardware bitstream kernel to be dispatched during runtime.Type: ApplicationFiled: September 22, 2017Publication date: March 28, 2019Inventors: XIANGYANG GUO, SIMONJIT DUTTA, HAN LEE, YIPENG WANG
-
Patent number: 10216668Abstract: Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.Type: GrantFiled: March 31, 2016Date of Patent: February 26, 2019Assignee: Intel CorporationInventors: Ren Wang, Yipeng Wang, Jr-Shian Tsai, Andrew Herdrich, Tsung-Yuan Tai, Niall McDonnell, Stephen Van Doren, David Sonnier, Debra Bernstein, Hugh Wilkinson, Narender Vangati, Stephen Miller, Gage Eads, Andrew Cunningham, Jonathan Kenny, Bruce Richardson, William Burroughs, Joseph Hasting, An Yan, James Clee, Te Ma, Jerry Pirog, Jamison Whitesell
-
Publication number: 20190052719Abstract: Technologies for flow rule aware exact match cache compression include multiple computing devices in communication over a network. A computing device reads a network packet from a network port and extracts one or more key fields from the packet to generate a lookup key. The key fields are identified by a key field specification of an exact match flow cache. The computing device may dynamically configure the key field specification based on an active flow rule set. The computing device may compress the key field specification to match a union of non-wildcard fields of the active flow rule set. The computing device may expand the key field specification in response to insertion of a new flow rule. The computing device looks up the lookup key in the exact match flow cache and, if a match is found, applies the corresponding action. Other embodiments are described and claimed.Type: ApplicationFiled: January 4, 2018Publication date: February 14, 2019Inventors: Yipeng Wang, Ren Wang, Antonio Fischetti, Sameh Gobriel, Tsung-Yuan C. Tai
-
Publication number: 20190042602Abstract: Techniques and apparatus for dynamic data access mode processes are described. In one embodiment, for example, an apparatus may a processor, at least one memory coupled to the processor, the at least one memory comprising an indication of a database and instructions, the instructions, when executed by the processor, to cause the processor to determine a database utilization value for a database, perform a comparison of the database utilization value to at least one utilization threshold, and set an active data access mode to one of a low-utilization data access mode or a high-utilization data access mode based on the comparison. Other embodiments are described.Type: ApplicationFiled: August 20, 2018Publication date: February 7, 2019Inventors: Ren Wang, Bruce Richardson, Tsung-Yuan Tai, Yipeng Wang, Pablo De Lara Guarch
-
Publication number: 20190042471Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.Type: ApplicationFiled: August 9, 2018Publication date: February 7, 2019Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai, Cristian Florin Dumitrescu, Xiangyang Guo
-
Publication number: 20190044869Abstract: Technologies for classifying network flows using adaptive virtual routing include a network appliance with one or more processors. The network appliance is configured to identify a set of candidate classification algorithms from a plurality of classification algorithm designs to perform a flow classification operation and deploy each of the candidate classification algorithms to a processor. Additionally the network appliance is configured to monitor a performance level of each of the deployed candidate classification algorithms and identify a candidate classification algorithm of the deployed candidate classification algorithms with the highest performance level. The network appliance is further configured to deploy the identified candidate classification algorithm with the highest performance level on each of the one or more processors that are configured to perform the flow classification operation. Other embodiments are described herein.Type: ApplicationFiled: August 17, 2018Publication date: February 7, 2019Inventors: Yipeng Wang, Ren Wang, Janet Tseng, Jr-Shian Tsai, Tsung-Yuan Tai
-
Publication number: 20190004709Abstract: Examples may include techniques to control an insertion ratio or rate for a cache. Examples include comparing cache miss ratios for different time intervals or windows for a cache to determine whether to adjust a cache insertion ratio that is based on a ratio of cache misses to cache insertions.Type: ApplicationFiled: June 30, 2017Publication date: January 3, 2019Inventors: Yipeng WANG, Ren WANG, Sameh GOBRIEL, Tsung-Yuan Charlie TAI
-
Publication number: 20180205653Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.Type: ApplicationFiled: June 29, 2017Publication date: July 19, 2018Inventors: Ren Wang, Tsung-Yuan C. Tai, Yipeng Wang, Sameh Gobriel
-
Patent number: 9846627Abstract: Systems and methods for modeling memory access behavior and memory traffic timing behavior are disclosed. According to an aspect, a method includes receiving data indicative of memory access behavior resulting from instructions executed on a processor. The method also includes determining a statistical profile of the memory access behavior, the profile including tuple statistics of memory access behavior. Further, the method includes generating a clone of the executed instructions based on the statistical profile for use in simulating the memory access behavior.Type: GrantFiled: February 15, 2016Date of Patent: December 19, 2017Assignee: North Carolina State UniversityInventors: Yan Solihin, Yipeng Wang, Amro Awad
-
Publication number: 20170286114Abstract: A processor of an aspect includes a decode unit to decode memory access instructions of a first type and to output corresponding memory access operations, and to decode memory access instructions of a second type and to output corresponding memory access operations. The processor also includes a load store queue coupled with the decode unit. The load store queue includes a load buffer that is to have a plurality of load buffer entries, and a store buffer that is to have a plurality of store buffer entries. The load store queue also includes a buffer entry allocation controller coupled with the load buffer and coupled with the store buffer. The buffer entry allocation controller is to allocate load and store buffer entries based at least in part on whether memory access operations correspond to memory access instructions of the first type or of the second type. Other processors, methods, and systems, are also disclosed.Type: ApplicationFiled: April 2, 2016Publication date: October 5, 2017Applicant: Intel CorporationInventors: Andrew J. Herdrich, Yipeng Wang, Ren Wang, Tsung-Yuan Charles Tai, Jr-Shian Tsai
-
Publication number: 20170286337Abstract: Technologies for a distributed hardware queue manager include a compute device having a procesor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.Type: ApplicationFiled: March 31, 2016Publication date: October 5, 2017Inventors: Ren Wang, Yipeng Wang, Jr-Shian Tsai, Andrew Herdrich, Tsung-Yuan Tai, Niall McDonnell, Stephen Van Doren, David Sonnier, Debra Bernstein, Hugh Wilkinson, Narender Vangati, Stephen Miller, Gage Eads, Andrew Cunningham, Jonathan Kenny, Bruce Richardson, William Burroughs, Joseph Hasting, An Yan, James Clee, Te Ma, Jerry Pirog, Jamison Whitesell
-
Publication number: 20170192921Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: ApplicationFiled: January 4, 2016Publication date: July 6, 2017Inventors: Ren Wang, Yipeng Wang, Andrew J. Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
-
Publication number: 20160239212Abstract: Systems and methods for modeling memory access behavior and memory traffic timing behavior are disclosed. According to an aspect, a method includes receiving data indicative of memory access behavior resulting from instructions executed on a processor. The method also includes determining a statistical profile of the memory access behavior, the profile including tuple statistics of memory access behavior. Further, the method includes generating a clone of the executed instructions based on the statistical profile for use in simulating the memory access behavior.Type: ApplicationFiled: February 15, 2016Publication date: August 18, 2016Inventors: Yan Solihin, Yipeng Wang, Amro Awad
-
Patent number: 8709753Abstract: This invention is metabolically engineer bacterial strains that provide increased intracellular NADPH availability for the purpose of increasing the yield and productivity of NADPH-dependent compounds. In the invention, native NAD-dependent GAPDH is replaced with NADP-dependent GAPDH plus overexpressed NADK. Uses for the bacteria are also provided.Type: GrantFiled: November 19, 2012Date of Patent: April 29, 2014Assignee: William Marsh Rice UniversityInventors: Ka-Yiu San, George N. Bennett, Yipeng Wang