Patents by Inventor Tsung-Yuan Tai

Tsung-Yuan Tai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Universal Core to Accelerator Communication Architecture

Publication number: 20250199890

Abstract: Methods and apparatus relating to a universal core to accelerator communication architecture for enhanced performance and/or programmability are described. In an embodiment, a sending agent is coupled to a processor core and a receiving agent is coupled to a hardware accelerator device. Memory store data corresponding to a request from the processor core. The sending agent and the receiving agent maintain a communication channel to facilitate communication between the processor core and the hardware accelerator device in response to the request. Other embodiments are also disclosed and claimed.

Type: Application

Filed: March 15, 2022

Publication date: June 19, 2025

Applicant: Intel Corporation

Inventors: Yipeng Wang, Rajesh M. Sankaran, Ren Wang, Narayan Ranganathan, Jr-Shian Tsai, Tsung-Yuan Tai, Heqing Zhu, Ilia Kurakin, Binh Pham, Halit Dogan
Apparatus and method for a closed-loop dynamic resource allocation control framework

Patent number: 12210434

Abstract: An apparatus and method for closed loop dynamic resource allocation.

Type: Grant

Filed: June 27, 2020

Date of Patent: January 28, 2025

Assignee: Intel Corporation

Inventors: Bin Li, Ren Wang, Kshitij Arun Doshi, Francesc Guim Bernat, Yipeng Wang, Ravishankar Iyer, Andrew Herdrich, Tsung-Yuan Tai, Zhu Zhou, Rasika Subramanian
HARDWARE ASSISTED EFFICIENT MEMORY MANAGEMENT FOR DISTRIBUTED APPLICATIONS WITH REMOTE MEMORY ACCESSES

Publication number: 20230114263

Abstract: Systems, apparatuses and methods may provide for technology that uses centralized hardware to detect a local allocation request associated with a local thread, detect a remote allocation request associated with a remote thread, wherein the remote allocation request bypasses a remote operating system, and process the local allocation request and the remote allocation request with respect to central heap, wherein the central heap is shared by the local thread and the remote thread. The local allocation request and the remote allocation request may include one or more of a first request to allocate a memory block of a specified size, a second request to allocate multiple memory blocks of a same size, a third request to resize a previously allocated memory block, or a fourth request to deallocate the previously allocated memory block.

Type: Application

Filed: December 13, 2022

Publication date: April 13, 2023

Inventors: Ren Wang, Poonam Shidlyali, Tsung-Yuan Tai
Sketch table for traffic profiling and measurement

Patent number: 11601531

Abstract: One embodiment provides a network system. The network system includes an application layer to execute one or more networking applications to generate or receive data packets having flow identification (ID) information; and a packet processing layer having profiling circuitry to generate a sketch table indicative of packet flow count data; the sketch table having a plurality of buckets, each bucket includes a first section including a plurality of data fields, each data field of the first section to store flow ID and packet count data, each bucket also having a second section having a plurality of data fields, each data field of the second section to store packet count data.

Type: Grant

Filed: December 3, 2019

Date of Patent: March 7, 2023

Assignee: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai
Techniques for dynamic database access modes

Patent number: 11500825

Abstract: Techniques and apparatus for dynamic data access mode processes are described. In one embodiment, for example, an apparatus may a processor, at least one memory coupled to the processor, the at least one memory comprising an indication of a database and instructions, the instructions, when executed by the processor, to cause the processor to determine a database utilization value for a database, perform a comparison of the database utilization value to at least one utilization threshold, and set an active data access mode to one of a low-utilization data access mode or a high-utilization data access mode based on the comparison. Other embodiments are described.

Type: Grant

Filed: August 20, 2018

Date of Patent: November 15, 2022

Assignee: INTEL CORPORATION

Inventors: Ren Wang, Bruce Richardson, Tsung-Yuan Tai, Yipeng Wang, Pablo De Lara Guarch
DYNAMIC RESOURCE ALLOCATION BASED ON QUALITY-OF-SERVICE PREDICTION

Publication number: 20220326999

Abstract: Apparatuses, methods, and systems for dynamic resource allocation based on quality-of-service prediction are disclosed. In embodiments, an apparatus includes quality-of-service prediction circuitry and a resource controller. The quality-of-service prediction circuitry is to make quality-of-service predictions using a model based at least in part on at least one performance counter measurements and at least one quality-of-service measurement. The resource controller is to allocate one or more shared resources based on the quality-of-service predictions and architectural performance counter measurements.

Type: Application

Filed: June 29, 2022

Publication date: October 13, 2022

Inventors: Drew Penney, Bin Li, Tsung-Yuan Tai, Anna Drewek-Ossowicka, Rameshkumar Illikkal, Andrew J. Herdrich, Jaroslaw Sydir
APPARATUS AND METHOD FOR A CLOSED-LOOP DYNAMIC RESOURCE ALLOCATION CONTROL FRAMEWORK

Publication number: 20210406147

Abstract: An apparatus and method for closed loop dynamic resource allocation.

Type: Application

Filed: June 27, 2020

Publication date: December 30, 2021

Inventors: BIN LI, REN WANG, KSHITIJ ARUN DOSHI, FRANCESC GUIM BERNAT, YIPENG WANG, RAVISHANKAR IYER, ANDREW HERDRICH, TSUNG-YUAN TAI, ZHU ZHOU, RASIKA SUBRAMANIAN
NEURAL NETWORK DENSE LAYER SPARSIFICATION AND MATRIX COMPRESSION

Publication number: 20210110269

Abstract: Neural network dense layer sparsification and matrix compression is disclosed. An example of an apparatus includes one or more processors; a memory to store data for processing, including data for processing of a deep neural network (DNN) including one or more layers, each layer including a plurality of neurons, the one or more processors to perform one or both of sparsification of one or more layers of the DNN, including selecting a subset of the plurality of neurons of a first layer of the DNN for activation based at least in part on locality sensitive hashing of inputs to the first layer; or compression of a weight or activation matrix of one or more layers of the DNN, including detection of sparsity patterns in a matrix of the first layer of the DNN based at least in part on locality sensitive hashing of patterns in the matrix.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Applicant: Intel Corporation

Inventors: Sameh Gobriel, Jesmin Jahari Tithi, Tsung-Yuan Tai
Technologies for a least recently used cache replacement policy using vector instructions

Patent number: 10789176

Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.

Type: Grant

Filed: August 9, 2018

Date of Patent: September 29, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai, Cristian Florin Dumitrescu, Xiangyang Guo
Apparatus and method for prioritized quality of service processing for transactional memory

Patent number: 10719442

Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.

Type: Grant

Filed: September 10, 2018

Date of Patent: July 21, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Raanan Sade, Yipeng Wang, Tsung-Yuan Tai, Sameh Gobriel
Technologies for distributed routing table lookup

Patent number: 10623311

Abstract: Technologies for distributed table lookup via a distributed router includes an ingress computing node, an intermediate computing node, and an egress computing node. Each computing node of the distributed router includes a forwarding table to store a different set of network routing entries obtained from a routing table of the distributed router. The ingress computing node generates a hash key based on the destination address included in a received network packet. The hash key identifies the intermediate computing node of the distributed router that stores the forwarding table that includes a network routing entry corresponding to the destination address. The ingress computing node forwards the received network packet to the intermediate computing node for routing. The intermediate computing node receives the forwarded network packet, determines a destination address of the network packet, and determines the egress computing node for transmission of the network packet from the distributed router.

Type: Grant

Filed: September 27, 2017

Date of Patent: April 14, 2020

Assignee: Intel Corporation

Inventors: Sameh Gobriel, Ren Wang, Christian Maciocco, Tsung-Yuan Tai
SYSTEM, METHOD, AND APPARATUS FOR SNAPSHOT PREFETCHING TO IMPROVE PERFORMANCE OF SNAPSHOT OPERATIONS

Publication number: 20200104259

Abstract: A snapshot prefetcher to perform snapshot prefetching to improve performance of snapshot read operations. An apparatus embodiment includes a snapshot read tracking circuitry to track snapshot read requests made by a first processor core to read a plurality of cache lines, and to detect a snapshot read access stream based on the tracked snapshot read requests. A snapshot prefetch issuing circuitry of the apparatus to issue, based on the detected snapshot read access stream, one or more snapshot prefetch requests, including a first snapshot prefetch request to prefetch data from a first cache line stored in, and owned exclusively by, a first storage location outside the first processor core. The snapshot prefetch issuing circuitry further to store the prefetched data in a second storage location within the first processor core, wherein after the prefetch, exclusive ownership of the first cache line is to remain with the first storage location.

Type: Application

Filed: September 28, 2018

Publication date: April 2, 2020

Inventors: Ren Wang, Lawrence C. Stewart, Binh Pham, Andrew Herdrich, Venkata Krishnan, Anil Vasudevan, Joseph Nuzman, Tsung-Yuan Tai
Sketch Table For Traffic Profiling and Measurement

Publication number: 20200106867

Abstract: One embodiment provides a network system. The network system includes an application layer to execute one or more networking applications to generate or receive data packets having flow identification (ID) information; and a packet processing layer having profiling circuitry to generate a sketch table indicative of packet flow count data; the sketch table having a plurality of buckets, each bucket includes a first section including a plurality of data fields, each data field of the first section to store flow ID and packet count data, each bucket also having a second section having a plurality of data fields, each data field of the second section to store packet count data.

Type: Application

Filed: December 3, 2019

Publication date: April 2, 2020

Applicant: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai
APPARATUS AND METHOD FOR PRIORITIZED QUALITY OF SERVICE PROCESSING FOR TRANSACTIONAL MEMORY

Publication number: 20200081835

Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.

Type: Application

Filed: September 10, 2018

Publication date: March 12, 2020

Inventors: REN WANG, RAANAN SADE, YIPENG WANG, Tsung-Yuan TAI, SAMEH GOBRIEL
Techniques for routing packets between virtual machines

Patent number: 10567510

Abstract: Various embodiments are generally directed to techniques for improving the efficiency of exchanging packets between pairs of VMs within a communications server. An apparatus may include a processor component; a network interface to couple the processor component to a network; a virtual switch to analyze contents of at least one packet of a set of packets to be exchanged between endpoint devices through the network and the communications server, and to route the set of packets through one or more virtual servers of multiple virtual servers based on the contents; and a transfer component of a first virtual server of the multiple virtual servers to determine whether to route the set of packets to the virtual switch or to transfer the set of packets to a second virtual server of the multiple virtual servers in a manner that bypasses the virtual switch based on a routing rule.

Type: Grant

Filed: October 2, 2017

Date of Patent: February 18, 2020

Assignee: INTEL CORPORATION

Inventors: Mesut A. Ergin, Jr-Shian Tsai, Janet Tseng, Ren Wang, Jun Nakajima, Tsung-Yuan Tai
Techniques for routing packets among virtual machines

Patent number: 10356012

Abstract: Various embodiments are generally directed to techniques for improving the efficiency of exchanging packets among multiple VMs within a communications server, and between the communications server and other devices in a communications system. An apparatus may include a virtual switch to analyze contents of at least one packet of a set of packets to be exchanged between endpoint devices through a network, and to correlate the contents to a pathway to extend through one or more of the VMs that are each configured as virtual servers of multiple virtual servers; and an interface control component to select at least one virtual network interface of each of the one or more virtual servers along the pathway to operate in a polling mode, and to select a virtual network interface of at least one virtual server of the multiple virtual servers not along the pathway to operate in a non-polling mode.

Type: Grant

Filed: August 20, 2015

Date of Patent: July 16, 2019

Assignee: INTEL CORPORATION

Inventors: Alexander W. Min, Tsung-Yuan Tai, Ren Wang, Mesut A. Ergin, Jr-Shian Tsai
Technologies for a distributed hardware queue manager

Patent number: 10216668

Abstract: Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.

Type: Grant

Filed: March 31, 2016

Date of Patent: February 26, 2019

Assignee: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Jr-Shian Tsai, Andrew Herdrich, Tsung-Yuan Tai, Niall McDonnell, Stephen Van Doren, David Sonnier, Debra Bernstein, Hugh Wilkinson, Narender Vangati, Stephen Miller, Gage Eads, Andrew Cunningham, Jonathan Kenny, Bruce Richardson, William Burroughs, Joseph Hasting, An Yan, James Clee, Te Ma, Jerry Pirog, Jamison Whitesell
[ICE] ARCHITECTURE AND MECHANISMS TO ACCELERATE TUPLE-SPACE SEARCH WITH INTERGRATED GPU

Publication number: 20190042304

Abstract: Methods, apparatus, systems, and software for architectures and mechanisms to accelerate tuple-space search with integrated GPUs (Graphic Processor Units). One of the architectures employs GPU-side lookup table sorting, under which local and global hit count histograms are maintained for work groups, and sub-tables containing rules for tuple matching are re-sorted based on the relative hit rates of the different sub-tables. Under a second architecture, two levels of parallelism are implemented: packet-level parallelism and lookup table-parallelism. Under a third architecture, dynamic two-level parallel processing with pre-screen is implemented. Adaptive decision making mechanisms are also disclosed to select which architecture is optimal in view of multiple considerations, including application preferences, offered throughput, and available GPU resources.

Type: Application

Filed: December 3, 2017

Publication date: February 7, 2019

Applicant: Intel Corporation

Inventors: Ren Wang, Janet Tseng, Jr-Shian Tsai, Tsung-Yuan Tai
TECHNOLOGIES FOR A LEAST RECENTLY USED CACHE REPLACEMENT POLICY USING VECTOR INSTRUCTIONS

Publication number: 20190042471

Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.

Type: Application

Filed: August 9, 2018

Publication date: February 7, 2019

Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai, Cristian Florin Dumitrescu, Xiangyang Guo
Technologies for classifying network flows using adaptive virtual routing

Publication number: 20190044869

Abstract: Technologies for classifying network flows using adaptive virtual routing include a network appliance with one or more processors. The network appliance is configured to identify a set of candidate classification algorithms from a plurality of classification algorithm designs to perform a flow classification operation and deploy each of the candidate classification algorithms to a processor. Additionally the network appliance is configured to monitor a performance level of each of the deployed candidate classification algorithms and identify a candidate classification algorithm of the deployed candidate classification algorithms with the highest performance level. The network appliance is further configured to deploy the identified candidate classification algorithm with the highest performance level on each of the one or more processors that are configured to perform the flow classification operation. Other embodiments are described herein.

Type: Application

Filed: August 17, 2018

Publication date: February 7, 2019

Inventors: Yipeng Wang, Ren Wang, Janet Tseng, Jr-Shian Tsai, Tsung-Yuan Tai

1 2 3 next