Patents by Inventor Yen-Kuang Chen

Yen-Kuang Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for memory bandwidth friendly sorting on multi-core architectures

Publication number: 20110066806

Abstract: In some embodiments, the invention involves utilizing a tree merge sort in a platform to minimize cache reads/writes when sorting large amounts of data. An embodiment uses blocks of pre-sorted data residing in “leaf nodes” residing in memory storage. A pre-sorted block of data from each leaf node is read from memory and stored in faster cache memory. A tree merge sort is performed on the nodes that are cache resident until a block of data migrates to a root node. Sorted blocks reaching the root node are written to memory storage in an output list until all pre-sorted data blocks have been moved to cache and merged upward to the root. The completed output list in memory storage is a list of the fully sorted data. Other embodiments are described and claimed.

Type: Application

Filed: May 26, 2009

Publication date: March 17, 2011

Inventors: Jatin Chhugani, Sanjeev Kumar, Anthony-Trung D. Nguyen, Yen-Kuang Chen, Victor W. Lee, William Macy
Bitstream Buffer Manipulation with a SIMD Merge Instruction

Publication number: 20110035426

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Application

Filed: October 19, 2010

Publication date: February 10, 2011

Inventors: Yen-Kuang Chen, William W. Macy, JR., Matthew Holliman, Eric L. Debes, Minerva M. Yeung
Using a Texture Unit for General Purpose Computing

Publication number: 20110025700

Abstract: An interpolation unit, such as may be found in a texture unit or texture sampler, may be used utilized to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to an interpolation unit. The interpolation unit may use linear interpolators in order to perform the dot product calculations.

Type: Application

Filed: July 30, 2009

Publication date: February 3, 2011

Inventors: Victor W. Lee, Mikhail Smelyanskiy, Yen-Kuang Chen, Jatin Chhugani, Jose Gonzalez, Changkyu Kim, Ganesh S. Dasika
Bitstream buffer manipulation with a SIMD merge instruction

Patent number: 7818356

Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.

Type: Grant

Filed: July 1, 2003

Date of Patent: October 19, 2010

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Young
SHARED CACHE MEMORIES FOR MULTI-CORE PROCESSORS

Publication number: 20100153649

Abstract: Embodiments of shared cache memories for multi-core processors are presented. In one embodiment, a cache memory comprises a group of sampling cache sets and a controller to determine a number of misses that occur in the group of sampling cache sets. The controller is operable to determine a victim cache line for a cache set based at least in part on the number of misses.

Type: Application

Filed: December 15, 2008

Publication date: June 17, 2010

Inventors: Wenlong Li, Yu Chen, Changkyu Kim, Christopher J. Hughes, Yen-Kuang Chen
Method and apparatus for parallel table lookup using SIMD instructions

Patent number: 7739319

Abstract: Method, apparatus, and program means for performing a parallel table lookup using SIMD instructions. The method of one embodiment comprises loading a table having a set of L data elements. A determination of whether the table fits into a single register is made. A data lookup into the table is performed with a packed data shuffle operation if the determination indicates that the table does fit into a single register. The table is divided into a plurality of sections if the table does not fit into a single register. Each of the sections is sized to fit into a single register. A plurality of packed data shuffle operations are executed on the plurality of sections to look up data in the table.

Type: Grant

Filed: July 1, 2003

Date of Patent: June 15, 2010

Assignee: Intel Corporation

Inventors: William W. Macy, Jr., Eric L. Debes, Yen-Kuang Chen, Minerva M. Yeung
Increasing concurrency and controlling replication in a multi-core cache hierarchy

Publication number: 20100138607

Abstract: In one embodiment, the present invention includes a directory of a private cache hierarchy to maintain coherency between data stored in the cache hierarchy, where the directory is to enable concurrent cache-to-cache transfer of data to two private caches. Other embodiments are described and claimed.

Type: Application

Filed: December 3, 2008

Publication date: June 3, 2010

Inventors: Christopher J. Hughes, Changkyu Kim, Yen-Kuang Chen
Method and apparatus for computing matrix transformations

Patent number: 7725521

Abstract: A method and apparatus for performing matrix transformations including multiply-add operations and byte shuffle operations on packed data in a processor. In one embodiment, two rows of content byte elements are shuffled to generate a first and second packed data respectively including elements of a first two columns and of a second two columns. A third packed data including sums of products is generated from the first packed data and elements from two rows of a matrix by a multiply-add instruction. A fourth packed data including sums of products is generated from the second packed data and elements from two more rows of the matrix by another multiply-add instruction. Corresponding sums of products of the third and fourth packed data are then summed to generate two rows of a product matrix. Elements of the product matrix may be generated in an order that further facilitates a second matrix multiplication.

Type: Grant

Filed: October 10, 2003

Date of Patent: May 25, 2010

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, Eric Q. Li, William W. Macy, Jr., Minerva M. Yeung
Apparatus and method for reducing power consumption on simultaneous multi-threading systems

Patent number: 7653906

Abstract: Activities may be delayed from being dispatched until another activity is ready to be dispatched. Dispatching more than activities increase overlapping in execution time of activities. By delaying the dispatch of the activities, power consumption and thermal dissipation on a multi-threading processor may be reduced.

Type: Grant

Filed: October 23, 2002

Date of Patent: January 26, 2010

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, Ishmael F. Santos
DETECTION OF STREAMING DATA IN CACHE

Publication number: 20100005241

Abstract: An apparatus to detect streaming data in memory is presented. In one embodiment the apparatus use reuse bits and S-bits status for cache lines wherein an S-bit status indicates the data in the cache line are potentially streaming data. To enhance the efficiency of a cache, different measures can be applied to make the streaming data become the next victim during a replacement.

Type: Application

Filed: April 8, 2008

Publication date: January 7, 2010

Inventors: Changkyu Kim, Christopher J. Hughes, Yen-Kuang Chen
Method and apparatus for rearranging data between multiple registers

Patent number: 7631025

Abstract: Method, apparatus, and program means for rearranging data between multiple registers. The method of one embodiment comprises shuffling first set of packed data from a first source based on a first set of masks to produce a first set of shuffled data. The first set of masks is to include a first plurality of control entries to set designated data element positions in the first set of shuffled data to zero. A second packed data from a second source is shuffled based on a second set of masks to produce a second set of shuffled data. The second set of masks includes a second plurality of control entries to set to zero data element positions in the second set of shuffled data opposite to said designated data element positions in the first set of shuffled data. The first set of shuffled data and said second set of shuffled data are merged together to form a packed data resultant.

Type: Grant

Filed: June 30, 2003

Date of Patent: December 8, 2009

Assignee: Intel Corporation

Inventors: Eric L. Debes, William W. Macy, Jr., Patrice L. Roussel, Yen-Kuang Chen
Dynamically Re-Classifying Data In A Shared Cache

Publication number: 20090271572

Abstract: In one embodiment, the present invention includes a method for determining if a state of data is indicative of a first class of data, re-classifying the data from a second class to the first class based on the determination, and moving the data to a first portion of a shared cache associated with a first requester unit based on the re-classification. Other embodiments are described and claimed.

Type: Application

Filed: July 2, 2009

Publication date: October 29, 2009

Inventors: Christopher J. Hughes, Yen-Kuang Chen
Vector instructions to enable efficient synchronization and parallel reduction operations

Publication number: 20090249026

Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.

Type: Application

Filed: March 28, 2008

Publication date: October 1, 2009

Inventors: Mikhail Smelyanskiy, Sanjeev Kumar, Daehyun Kim, Jatin Chhugani, Changkyu Kim, Christopher J. Hughes, Victor W. Lee, Anthony D. Nguyen, Yen-Kuang Chen
Method and system for proximity caching in a multiple-core system

Patent number: 7584327

Abstract: Embodiments of the invention relate to a method and system for caching data in a multiple-core system with shared cache. According to the embodiments, data used by the cores may be classified as being of one of predetermined types. The classification may enable efficiencies to be realized by performing different types of handling corresponding to different data types. For example, data classified as likely to be re-used may be stored in a shared cache, in a region of the shared cache that is closest to a core using the data. By storing the data this way, access time and energy consumption may be reduced if the data is subsequently retrieved for use by the core.

Type: Grant

Filed: December 30, 2005

Date of Patent: September 1, 2009

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, Christopher J. Hughes
Data classification in shared cache of multiple-core processor

Patent number: 7571285

Abstract: In one embodiment, the present invention includes a method for determining if a state of data is indicative of a first class of data, re-classifying the data from a second class to the first class based on the determination, and moving the data to a first portion of a shared cache associated with a first requester unit based on the re-classification. Other embodiments are described and claimed.

Type: Grant

Filed: July 21, 2006

Date of Patent: August 4, 2009

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Yen-Kuang Chen
METHOD AND APPARATUS FOR PROVIDING PREDICTION MODE FINE GRANULARITY SCALABILITY

Publication number: 20090034609

Abstract: In an encoding process, video data are represented as a bitstream of a quantized base layer and at least two enhancement layers, with each picture in each layer identified by a start code. The base layer, plus a number of enhancement layers capable of being transmitted by the communication channel's bandwidth, are transmitted on the communication channel.

Type: Application

Filed: October 14, 2008

Publication date: February 5, 2009

Inventors: Wen-Hsiao Peng, Yen-Kuang Chen
System and method for cache coherency in a cache with different cache location lengths

Patent number: 7454576

Abstract: A system and method for the design and operation of a cache system with differing cache location lengths in level one caches is disclosed. In one embodiment, each level one cache may include groups of cache locations of differing length, capable of holding portions of a level two cache line. A state tree may be created from data in a sharing vector. When a request arrives from a level one cache, the level two cache may examine the nodes of the state tree to determine whether the node of the state tree corresponding to the incoming request is already active. The results of this determination may be used to inhibit or permit the concurrent processing of the request.

Type: Grant

Filed: December 27, 2004

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, Christopher J. Hughes, James M. Tuck, III
Method and apparatus for performing efficient transformations with horizontal addition and subtraction

Patent number: 7392275

Abstract: A method and apparatus for including in a processor instructions for performing horizontal intra-add operations on packed data. One embodiment of the processor is coupled to a memory. The memory has stored therein at least a first packed data. The processor performs operations on data elements in the first packed data to generate a plurality of data elements in a second packed data in response to receiving an instruction. At least two of the plurality of data elements in the second packed data store the results of an intra-add operation, at least one of these results coming from the operation on data elements of the first packed data. One embodiment of a software method utilizes horizontal intra-add instructions for performing butterfly computations as may be employed, for example, in Walsh-Hadamard transforms or in Fast-Fourier Transforms.

Type: Grant

Filed: June 30, 2003

Date of Patent: June 24, 2008

Assignee: Intel Corporation

Inventors: William W. Macy, Eric Debes, Minerva Yeung, Yen-Kuang Chen, Patrice Roussel
Method and apparatus for motion estimation

Patent number: 7327787

Abstract: A method and system of motion estimation for video data compression is disclosed. Individual frames of video data are divided into blocks of pixels. One frame is searched to find a block of pixels that matches intensity with the block of pixels from a second frame. In one embodiment, a motion estimator performs a rhombus shaped search of progressively smaller range for a matching block of pixels. In one embodiment, prediction motion vector is used to reduce the search efforts. In one embodiment, the actual shape of the rhombus can be adjusted to the type of motion expected.

Type: Grant

Filed: November 20, 2000

Date of Patent: February 5, 2008

Assignee: Intel Corporation

Inventors: Yen-Kuang Chen, Hong Jiang
Dynamically re-classifying data in a shared cache

Publication number: 20080022049

Abstract: In one embodiment, the present invention includes a method for determining if a state of data is indicative of a first class of data, re-classifying the data from a second class to the first class based on the determination, and moving the data to a first portion of a shared cache associated with a first requester unit based on the re-classification. Other embodiments are described and claimed.

Type: Application

Filed: July 21, 2006

Publication date: January 24, 2008

Inventors: Christopher J. Hughes, Yen-Kuang Chen

prev … 7 8 9 10 11 12 13 14 next