Dual function system and method for shuffling packed data elements
An apparatus and method for performing a shuffle operation on packed data using computer-implemented steps is described. In one embodiment, a first packed data operand having at least two data elements is accessed. A second packed data operand having at least two data elements is accessed. One of the data elements in the first packed data operand is shuffled into a lower destination field of a destination register, and one of the data elements in the second packed data operand is shuffled into an upper destination field of the destination register.
Latest Intel Patents:
- ENHANCED TRAFFIC INDICATIONS FOR MULTI-LINK WIRELESS COMMUNICATION DEVICES
- METHODS AND APPARATUS FOR USING ROBOTICS TO ASSEMBLE/DE-ASSEMBLE COMPONENTS AND PERFORM SOCKET INSPECTION IN SERVER BOARD MANUFACTURING
- MICROELECTRONIC ASSEMBLIES
- INITIALIZER FOR CIRCLE DISTRIBUTION FOR IMAGE AND VIDEO COMPRESSION AND POSTURE DETECTION
- MECHANISM TO ENABLE ALIGNED CHANNEL ACCESS
More than one reissue application has been filed for the reissue of U.S. Pat. No. 6,041,404, which is hereby incorporated by reference in its entirety. The reissue applications are application Ser. No. 10/104,205 (the present and parent reissue application) and Ser. No. 14/283,020 which is a reissue continuation of application Ser. No. 10/104,205.
FIELD OF THE INVENTIONThe present invention relates in general to the field of computer systems, and in particular, to an apparatus and method for performing multidimensional computations based on a shuffle operation.
BACKGROUND OF THE INVENTIONTo improve the efficiency of multimedia applications, as well as other applications with similar characteristics, a Single Instruction, Multiple Data (SIMD) architecture has been implemented in computer systems to enable one instruction to operate on several data simultaneously, rather than on a single data. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed with one instruction, resulting in significant performance improvement.
Although many applications currently in use can take advantage of such operations, known as vertical operations, there are a number of important applications which would require the rearrangement of the data elements before vertical operations can be implemented so as to provide realization of the application. Examples of such important applications include the dot product and matrix multiplication operations, which are commonly used in 3-D graphics and signal processing applications.
One problem with rearranging the order of data elements within a register or memory word is the mechanism used to indicate how the data should be rearranged. Typically, a mask or control word is used. The control word must include enough bits to indicate which of the source data fields must be moved into each destination data field. For example, if a source operand has eight data fields, requiring three bits to designate any given data field, and the destination register has four data fields, (3×4) or 12 bits are required for the control word. However, on a processor implementation where there are less than 12 bits available for the control register, a full shuffle cannot be supported.
Therefore, there is a need for a way to reorganize the order of data elements where less than the full number of bits is available for a control register.
SUMMARY OF THE INVENTIONThe present invention provides an apparatus and method for performing a shuffle operation on packed data using computer-implemented steps is described. In one embodiment, a first packed data operand having at least two data elements is accessed. A second packed data operand having at least two data elements is accessed. One of the data elements in the first packed data operand is shuffled into a lower destination field of a destination register, and one of the data elements in the second packed data operand is shuffled into an upper destination field of the destination register.
The present invention is illustrated by way of example and may be better understood by referring to the following description in conjunction with the accompanying drawings, in which like references indicate similar elements and in which:
In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it will be understood by one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the invention.
The present invention provides a way to reorganize the order of data elements where less than the full number of bits is available for a control register. According to one aspect of the invention, a method and apparatus are described for moving data elements in a packed data operand (a shuffle operation). The shuffle operation allows shuffling of certain-sized data into any combination from two source registers or memory into a destination register. The destination register may be the same as a source register. The shuffle instruction is useful in data reorganization and in moving data into different locations of the register to allow, for example, extra storage for scalar operations, or for facilitating the conversion between data formats such as from packed integer to packed floating point and vice versa.
The term “registers” is used herein to refer to the on-board processor storage locations that are used as part of macroinstructions to identify operands. In other words, the registers referred to herein are those that are visible from the outside of the processor (from a programmers perspective). However, the registers described herein can be implemented by circuitry within a processor using any number of different techniques, such as dedicated physical registers, dynamically allocated physical registers using register renaming, combinations of dedicated and dynamically allocated physical registers, etc.
COMPUTER SYSTEMIn addition to other devices, one or more of a network 130, a TV broadcast signal receiver 131, a fax/modem 132, a digitizing unit 133, a sound unit 134, and a graphics unit 135 may optionally be coupled to bus 115. The network 130 and fax modem 132 represent one or more network connections for transmitting data over a machine readable media (e.g., carrier waves). The digitizing unit 133 represents one or more devices for digitizing images (i.e., a scanner, camera, etc.). The sound unit 134 represents one or more devices for inputting and/or outputting sound (e.g., microphones, speakers, magnetic storage devices, optical storage devices, etc.). The graphics unit 135 represents one or more devices for generating 3-D images (e.g., graphics card).
The decode unit 140 is shown including packed data instruction set 145 for performing operations on packed data. In one embodiment, the packed data instruction set 145 includes the following instructions: a move instruction(s) 150, a shuffle instruction(s) 155, an add instruction(s) (such as ADDPS) 160, and a multiply instruction(s) 165. The MOVAPS, SHUFPS and ADDPS instructions are applicable to packed floating point data, in which the results of an operation between two sets of numbers having a predetermined number of bits, are stored in a register having the same predetermined number of bits, i.e., the size or configuration of the operand is the same as that of the result register. The operation of each of these instructions is further described herein. While one embodiment is described in which the packed data instructions operate on floating point data, alternative embodiments could alternatively or additionally have similar instructions that operate on integer data.
In addition to the packed data instructions, processor 105 can include new instructions and/or instructions similar to or the same as those found in existing general purpose processors. For example, in one embodiment the processor 105 supports an instruction set which is compatible with the Intel® Architecture instruction set used by existing processors, such as the Pentium® II processor. Alternative embodiments of the invention may contain more or less, as well as different, packed data instructions and still utilize the teachings of the invention.
The registers 141 represent a storage area on processor 105 for storing information, including control/status information, integer data, floating point data, and packed data. It will be understood by one of ordinary skill in the art that one aspect of the invention is the described instruction set for operating on packed data. According to this aspect of the invention, the storage area used for storing the packed data is not critical. The term data processing system is used herein to refer to any machine for processing data, including the computer systems(s) described with reference to
While one embodiment of the invention is described in which the processor 105, executing the packed data instructions operates on 128-bit packed data operands containing four 32-bit single precision floating point values, the processor 105 can operate on packed data in several different packed data formats. For example, in one embodiment, packed data can be operated on in one of three formats: a “packed byte” format (e.g., PADDb), a “packed word” format (e.g., PADDw), or a “packed double word” (dword) format (e.g., PADDd). The packed byte format includes eight separate 8-bit data elements the packed word format includes four separate 16-bit data elements; the packed dword format includes two separate 32-bit data elements. While certain instructions are discussed below with reference to one or two packed data formats, the instructions may be similarly applied the other packed data formats of the invention.
The shuffle instruction of the present invention is part of a family of many different instructions which operate with SIMD architecture. For example,
|X3|X2|X1X|X0|
|X3|X2|X1|X0|
The process S500 then proceeds to process step S520, where numbers Y0, Y1, Y2 and Y3 are stored as data elements in a packed data item 525. For present discussion purposes, each data element is 16-bits wide and is contained in register X1, in the following order:
|Y3|Y2|Y1|Y0|
The process S500 then advances to process step S530, where a shuffle instruction is performed on the contents of register X0 (data item 515) and register X1 (data item 525) to shuffle any one of the four data elements from the first data item 515 to the lower two fields of a destination register 535, and to shuffle any one of the four data elements from the second data item 525 to the upper two fields of the destination register 535. The resulting data item 535 is as follows:
|{Y3, Y2, Y1, Y0}|{Y3, Y2, Y1, Y0}|{X3, X2, X1, X0}|{X3, X2, X1, X0}|
Accordingly, a shuffle operation is performed. Although
An 8-bit immediate value is used as a control word to indicate how data elements should be shuffled. Bits 0,1 of the control word indicate which of the four data elements in the first operand are shuffled into the first or lowest data element of the destination register. Bits 2,3 of the control word indicate which of the four data elements in the first operand are shuffled into the second data element of the destination register. Bits 4,5 of the control word indicate which of the four data elements in the second operand are shuffled into the third data element of the destination register. Bits 6,7 of the control word indicate which of the four data elements in the second operand are shuffled into the fourth data element of the destination register. For example, given a first data operand with four data elements contained in the following order:
|D|C|B|A|
and also given a second data operand with four data elements contained in the following order:
|H|G|F|E|
and also given a shuffle control word of 10001111, the result of the shuffle is as follows:
|G|E|D|D|
It will be recognized by one of ordinary skill in the art that the size of the shuffle control word may vary depending without loss of compatibility with the present invention, depending on the number of data elements in the source data operand and the number of fields in the destination register.
Accordingly, a shuffle operation is performed. Although
The shuffle instruction of the present invention may be used as part of many different applications. For example,
In one embodiment, the computer system 100 shown in
In this embodiment, the digital filter unit 718 is implemented using the processor 105 and the software 136 to perform the a digital filter. In this embodiment, the processor 105, executing the software 136, performs the digital filter using shuffle operations, and stores the filtered data 718 in storage device 110. In this manner, the digital filter is performed by the host processor of the computer system, rather than the TV broadcast signal receiver 131. As a result, the complexity of the TV broadcast signal receiver 131 is reduced. In this embodiment, the video decoder 721 may be implemented in any number of different combinations of hardware, software, and/or firmware. The audio and video data 724 can then be sorted, and/or displayed on the display 125 and the sound unit 134, respectively.
In one embodiment, the computer system 100 shown in
While several examples uses of shuffle operations have been described, it will be understood by one of ordinary skill in the art that the invention is not limited to these uses. In addition, while the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. The method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention.
Claims
1. A computer system comprising:
- a hardware unit to transmit data representing graphics to another computer or a display;
- a processor coupled to the hardware unit; and
- a storage device coupled to the processor and having stored therein an instruction, which when executed by the processor, causes the processor to at least,
- access a first packed data operand having at least two data elements;
- access a second packed data operand having at least two data elements;
- select a first set of data elements from the first packed data operand;
- copy each of the data elements in the first set to specified data fields located in the tower half of a destination operand;
- select a second set of data elements from the second packed data operand; and
- copy each of the data elements in the second set to specified data fields located in the upper half of the destination operand.
2. The computer system of claim 1 wherein the storage device further comprises a packing device for packing floating point data into the data elements.
3. The computer system of claim 1 wherein the storage device further comprises a packing device for packing integer data into the data elements.
4. A system as claimed in claim 1 wherein the first and second packed data operands are the same operand.
5. A method comprising the computer-implemented steps of:
- decoding a single instruction;
- in response to the step of decoding the single instruction,
- accessing a first packed data operand having at least two data elements;
- accessing a second packed data operand having at least two data elements;
- selecting a first set of data elements from the first packed data operand;
- copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand;
- selecting a second set of data elements from the second packed data operand; and
- copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand.
6. The method of claim 5 further comprising the step of packing floating point data into the data elements.
7. The method of claim 5 further comprising the step of packing integer data into the data elements.
8. A method as claimed in claim 5 wherein the first and second packed data operands are the same operand.
9. A method comprising the computer implemented steps of:
- accessing data representative of a first three-dimensional image;
- altering the data using three-dimensional geometry to generate a second
- three-dimensional image, the step of altering at least including,
- accessing a first packed data operand having at least two data elements;
- accessing a second packed data operand having at least two data elements;
- selecting a first set of data elements from the first packed data operand;
- copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand;
- selecting a second set of data elements from the second packed data operand;
- copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand; and
- displaying the second three-dimensional image.
10. The method of claim 9 wherein the step of altering includes the performance of a three-dimensional transformation.
11. The method of claim 9 wherein the step of altering includes the step of packing floating point data into the data elements.
12. The method of claim 9 wherein the step of altering includes the step of packing integer data into the data elements.
13. A method as claimed in claim 9 wherein the first and second packed data operands are the same operand.
14. A method comprising the computer implemented steps of:
- accessing data representative of a first three-dimensional image;
- altering the data using three-dimensional geometry to generate a second three-dimensional image, the step of altering at least including,
- accessing a first packed data operand having at least two data elements;
- accessing a second packed data operand having at least two data elements;
- selecting a first set of data elements from the first packed data operand;
- copying each of the data elements in the first set to specified data fields located in the lower half of a destination operand;
- selecting a second set of data elements from the second packed data operand;
- copying each of the data elements in the second set to specified data fields located in the upper half of the destination operand; and
- displaying the second three-dimensional image.
15. The method of claim 14 wherein the step of altering includes the performance of a three-dimensional transformation.
16. The method of claim 14 wherein the step of altering includes the step of packing floating point data into the data elements.
17. The method of claim 14 wherein the step of altering includes the step of packing integer data into the data elements.
18. A method as claimed in claim 14 wherein the first and second packed data operands are the same operand.
19. A processor-implemented method for reducing the number of control hits required to shuffle packed data elements from first and second source operands, comprising the steps of:
- decoding a single instruction specifying first and second source operands and a field of control bits; and
- responsive to the field of control bits, generating a resultant packed data operand comprised of packed data elements from the first and second source operands,
- wherein the control bits are limited to specifying for the upper and lower halves of the resultant packed data operand, data elements from the first and second source operands, respectively.
20. The method as claimed in claim 19 wherein the first and second packed data source operands and the resultant packed data operand are comprised of four packed data elements, and the field of control bits is an 8-bit field.
21. The method as claimed in claim 19 wherein the first and second packed data source operands are the same operand.
22. The method as claimed in claim 19 wherein the first and second packed data source operands are packed with floating point data.
23. A processor for performing a shuffle operation in response to a shuffle instruction comprising:
- a decoder which decodes a single instruction specifying first and second source operands and a field of control bits; and
- an execution unit which, responsive to the field of control bits, generates a resultant packed data operand comprised of packed data elements from the first and second source operands,
- wherein the control bits are limited to specifying for the upper and lower halves of the resultant packed data operand, data elements from the first and second source operands, respectively.
24. The processor as claimed in claim 23 wherein the first and second source operands are the same operand.
25. The method as claimed in claim 19 wherein the first and second packed data source operands and the resultant packed data operand are each comprised of at least two packed data elements.
26. The method as claimed in claim 19 wherein the field of control bits is an 8-bit field.
27. The method as claimed in claim 26 wherein an 8-bit immediate to fill the field of control bits is decoded with the single instruction.
28. The processor of claim 23 wherein said field of control bits comprises of an 8-bit immediate value.
29. The processor of claim 23 wherein said field of control bits comprises of an 8-bits.
30. The processor of claim 29 wherein said first and second source operands comprise of double-precision floating-point values.
31. The processor of claim 29 wherein said first and second source operands comprise single-precision floating-point values.
32. The processor of claim 29 wherein said packed data elements comprise of packed double words.
33. The processor of claim 29 wherein said packed data elements comprise of packed words.
34. The processor of claim 29 wherein said packed data elements comprise of packed bytes.
35. The processor of claim 29 wherein said first and said second operands comprise of 128-bits of packed data.
36. An apparatus comprising:
- a decode unit to decode a shuffle instruction into control signals, said shuffle instruction to include a first operand, a second operand, and a third operand wherein said third operand comprises of an 8-bit immediate value;
- said first operand to identify a first register to hold at least two packed data elements;
- said second operand to identify a memory location to hold at least two packed data elements;
- said third operand is to provide selection bits to indicate which of said packed data elements in said first operand and said second operand to select and copy to a resultant register; and
- an execution unit coupled to said decode unit, said execution unit responsive to said control signals and said selection bits to select a first set of data elements from said first register and to copy said first set of data elements to one or more lower destination fields of said resultant register, said execution unit further responsive to said control signals and said selection bits to select a second set of data elements from said memory location and to copy said second set of data elements to one or more upper destination fields of said resultant register.
37. The apparatus of claim 36 wherein said data elements of said first register and said second register comprise double-precision floating-point values.
38. The apparatus of claim 36 wherein said data elements of said first register and said second register comprise of single-precision floating-point values.
39. The apparatus of claim 36 wherein said packed data elements comprise of packed double words.
40. The apparatus of claim 36 wherein said packed data elements comprise of packed words.
41. The apparatus of claim 36 wherein said packed data elements comprise of packed bytes.
42. The apparatus of claim 36 wherein said first register is also said resultant register.
43. An apparatus comprising:
- an instruction decoder to receive and decode a shuffle instruction, said shuffle instruction to include an immediate operand comprising two or more sets of control bits;
- a first source register to hold a first packed data, said first packed data comprising of a first data element and a second data element;
- a second source register to hold a second packed data, said second packed data comprising of a third data element and a fourth data element;
- a destination register to hold a third packed data;
- an execution unit coupled to said first source resister to receive said first packed data, and to said second source register to receive said second packed data; and
- wherein said execution unit is further coupled to said instruction decoder to receive said two or more sets of control bits, said execution unit to select from said first source register at least one of said first and second data elements in response to a first one of said two or more sets of control bits and to copy said selected data element from said first source register to a first data field in a lower half of said destination register, and said execution unit to select from said second source register at least one of said third and fourth data elements in response to a second one of said two or more sets of control bits and to copy said selected data element from said second source register to a second data field in an upper half of said destination register.
44. The apparatus of claim 43 wherein said immediate operand is an 8-bit immediate operand.
45. The apparatus of claim 43 wherein said data elements of said first source register and said second source register comprise of double-precision floating-point values.
46. The apparatus of claim 43 wherein said data elements of said first source register and said second source register comprise of single-precision floating-point values.
47. The apparatus of claim 43 wherein said packed data comprise of packed double words.
48. The apparatus of claim 43 wherein said packed data comprise of packed words.
49. The apparatus of claim 43 wherein said packed data comprise of packed bytes.
50. The apparatus of claim 43 wherein said apparatus is defined by machine readable data on a machine readable medium.
51. The apparatus of claim 43 wherein said first source register is also said destination register.
52. The apparatus of claim 43 wherein said first source register is the same as said second source register.
53. The apparatus of claim 43 wherein said two or more sets of control bits comprise bits 0 and 1 of the immediate operand.
54. The apparatus of claim 44 wherein said 8-bit immediate operand comprises bits 0 and 1 to select from said first source register which data element is copied into the lowest data field in the lower half of the destination register, and bits 4 and 5 to select from said second source register which data element is copied into the lowest data field in the upper half of the destination register.
55. The apparatus of claim 44 wherein said 8-bit immediate operand comprises bits 0 through 3 to select from said first source register which data elements are copied into the lower half of the destination register, and bits 4 through 7 to select from said second source register which data elements are copied into the upper half of the destination register.
56. The apparatus of claim 55 wherein said 8-bit immediate operand comprises bits 2 and 3 to select from said first source register which data element is copied into the highest data field in the lower half of the destination register, and bits 6 and 7 to select from said second source register which data element is copied into the highest data field in the upper half of the destination register.
3711692 | January 1973 | Batcher |
3723715 | March 1973 | Chen et al. |
4139899 | February 13, 1979 | Tulpule et al. |
4161784 | July 17, 1979 | Cushing et al. |
4393468 | July 12, 1983 | New |
4418383 | November 29, 1983 | Doyle et al. |
4498177 | February 5, 1985 | Larson |
4707800 | November 17, 1987 | Montrone et al. |
4771379 | September 13, 1988 | Ando et al. |
4903228 | February 20, 1990 | Gregoire et al. |
4989168 | January 29, 1991 | Kuroda et al. |
5019968 | May 28, 1991 | Wang et al. |
5081698 | January 14, 1992 | Kohn |
5095457 | March 10, 1992 | Jeong |
5168571 | December 1, 1992 | Hoover et al. |
5187679 | February 16, 1993 | Vassiliadis et al. |
5268995 | December 7, 1993 | Diefendorff et al. |
5321810 | June 14, 1994 | Case et al. |
5327543 | July 5, 1994 | Miura et al. |
5390135 | February 14, 1995 | Lee et al. |
5408670 | April 18, 1995 | Davies |
5423010 | June 6, 1995 | Mizukami |
5426783 | June 20, 1995 | Norrie et al. |
5465374 | November 7, 1995 | Dinkjian et al. |
5487159 | January 23, 1996 | Byers et al. |
5497497 | March 5, 1996 | Miller et al. |
5579253 | November 26, 1996 | Lee et al. |
5594437 | January 14, 1997 | O'Malley |
5625374 | April 29, 1997 | Turkowski |
5680161 | October 21, 1997 | Lehman et al. |
5781457 | July 14, 1998 | Cohen et al. |
5802336 | September 1, 1998 | Peleg et al. |
5819117 | October 6, 1998 | Hansen |
5881259 | March 9, 1999 | Glass et al. |
5909572 | June 1, 1999 | Thayer et al. |
5931945 | August 3, 1999 | Yung et al. |
5933650 | August 3, 1999 | van Hook et al. |
6002881 | December 14, 1999 | York et al. |
6041404 | March 21, 2000 | Roussel et al. |
6058465 | May 2, 2000 | Nguyen |
6115812 | September 5, 2000 | Abdallah et al. |
6223277 | April 24, 2001 | Karguth |
6243808 | June 5, 2001 | Wang |
6381690 | April 30, 2002 | Lee |
WO 97/07450 | February 1997 | WO |
WO 97/09671 | March 1997 | WO |
WO 97/32278 | September 1997 | WO |
- “Visual Instruction Set (VIS™) User's Guide”, Version 1.1, Mar. 1997, pp. i-vii & 1-136.
- Mano, Morris M. , Computer System Architecture 1982, Prentice Hall, 2d Ed. pp. 140-144.
- Motorola, MC68020 32-bit Microprocessor User's Manual 1985, Prentice Hall, 2d Ed. pp. B-101 -B-103, B-135, B-136, B-169,B-170.
- Peleg, A, etal, Intel MMX for Multimedia PCs, Jan. 1997, Communications of the ACM, vol. 40, No. 1, pp. 25-38.
- Hansen,C, MicroUnity's MediaProcessor Architecture, 1996, IEEE, pp. 34-41.
- “Visual Instruction Set (VIS) User's Guide”, Version 1.1, Mar. 1997, pp. i-xii & 1-136.
- European Search Report, EP 99 30 2378, Mar. 14, 2000, 3 pages.
- Austrian Search Report, Appln. No. 9901342-7, Oct. 31, 2000, 7 pages.
- Intel Corporation, “Intel Architecture Software Developer's Manual, vol. 2; Instruction Set Reference,” 1999, 26 pages.
- Intel Corporation, “Willamette Processor Developer's Guide,” Manual, Feb. 2000, 16 pages.
- Tri-Media, “TM1000 Preliminary Data Book,” Phillips Electronics No. Amer., 1997, 30 pages.
- Silicon Graphics, “Silicon Graphics Introduces Compact MIPS® RISC Microprocessor Code for High Performance at a Low Cost,” Oct. 21, 1996, 13 pages.
- “MIPS Digital Media Extension,” Set Architecture Specification, http:/—/www.mips.com/MDMXspec.ps (Oct. 21, 1997), 8 pages.
- Hewlet Packard, “64-bit and Multimedia Extensions in the PA-RISC 2.0 Architecture,” Microprocessors Precision Architecture, 1997, 18 pages.
- Sun Microsystems, Ultrasparc™ The Visual Instruction Set (VIS™): On Chip Support for New-Media Processing, Whitepaper 95-022, 1996, 7 pages.
- Kawakami, Y., et al., “A Single-Chip Digital Signal Processor for Voiceband Applications,” IEEE, 1980 International Solid-State Circuits Conference, pp. 40-41.
- U1traSPARC Multimedia Capabilities On-Chip Support for Real0-Time Video and Advanced Graphics; SPARC Technology Business, Sep. 1994, Sun Microsystems, Inc.
- Case, B., “Philips Hopes to Displace DSPs with VLIW, TriMedia Processors Aimed at Future Multimedia Embedded Apps,” Microprocessor Report, Dec. 1994, pp. 12-18.
- Gwennap, L., “New PA-RISC Processor Decodes MPEG Video, H”'s PA-7100LC Uses New Instructions to Eliminate Decoder Chip, Microprocessor Report, Jan. 1994, pp. 16-17.
- TMS320c2X, User's Guide, Digital Signal Processing Products, Texas Instruments, 1993, pp. 3-2-3-11; 3-28-3-34; 4-1-4-22; 4-41; 4-103; 4-119; 4-120; 4-122; 4-150; 4-151.
- i860 TM. Microprocessor Family Programmer's Reference Manual, Intel Corporation, 1992, Chapters 1, 3, 8, and 12.
- Lee, R.B., “Accelerating Multimedia with Enhanced Microprocessors,” IEEE Micro, Apr. 1995, pp. 22-32.
- Pentium Processor's User's Manual, vol. 3: Architecture and Programming Manual, Intel Corporation, 1993, Chapters 1, 3, 4, 6, 8, and 18.
- Margulis, N., “i860 Microprocessor Architecture,” McGraw Hill, Inc., 1990, Chapters 6, 7, 8, 10, and 11.
- Intel i750, i860 TM, i960 Processors and Related Products, 1993, pp. 1-3.
- Motorola MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Inc., 1991.
- MC88110 Second Generation-RISC Microprocessor User's Manual, Motorola, Inc., Sep. 1992, pp. 2-1 through 2-22, 3-1 through 3-32, 5-1 through 5-25, 10-62 through 10-71, Index 1 through 17.
- Errata to MC88110 Second Generation RISC Microprocessor User's Manual, Motorola, Inc., 1992, pp. 1-11.
- MC88110 Programmer's Reference Guide, Motorola, Inc., 1992, pp. 1-4.
- Shipnes, J., “Graphics Processing with the 88110 RISC Microprocessor,” Motorola, Inc., IEEE, No. 0-8186-26455-0/92, 1992, pp. 169-174.
- Abbott, et al., “Broadband Algorithms with the MicroUnity Mediaprocessor,” MicroUnity Systems Engineering, Inc., Proceedings of Compcon, IEEE, 1996, pp. 349-354.
- Advanced Micro Devices, Inc., “AMD-3D Technology Manual,” Feb. 1998, pp. 1-58.
- Diefendorff, K., et al., “AltiVec Extension to PowerPC Accelerates Media Processing,” IEEE, #0272-1732/00, 2000, pp. 85-95.
- Hansen, C., “Architecture of a Broadband Mediaprocessor,” Proceedings of Compcon, IEEE, 1996, pp. 334-340.
- Hayes, et al., “MicroUnity Software Development Environment,” MicroUnity Systems Engineering, Inc., Proceedings of Compcon, IEEE, 1996, pp. 341-348.
- Intel Corporation, “Intel Architecture Software Developer's Manual, vol. 2; Instruction Set Reference,” 1999, 26 pgs.
- Intel Corporation, “IA-32 Intel® Architecture Software Developer's Manual, vol. I: Basic Architecture,” 2002, 21 pgs. total.
- Intel Corporation, “IA-32 Intel® Architecture Software Developer's Manual, vol. II: Instruction Set Reference,” 2002, 19 pgs. total.
- Intel Corporation, “Intel® Itanium™ Architecture Software Developer's Manual, vol. 3: Instruction Set Reference,” Rev. 2.0, Dec. 2001, 30 pgs. total.
- Intel Corporation, “Inte1486™ Microprocessor Family Programmer's Reference Manual,” 1992, 44 pgs. total.
- Intel Corporation, “Pentium® Processor Family Developer's Manual, vol. 3: Architecture and Programming Manual,” 1995, 54 pgs. total.
- Intel Corporation, “Pentium® Processor User's Manual, vol. 3: Architecture and Programming Manual,” 1993, 50 pgs. total.
- Levinthal, et al., “Chap—A SIMD Graphics Processor,” Computer Graphics Project, ACM, vol. 18, No. 3, Jul. 1984, pp. 77-81.
- Levinthal, et al., “Parallel Computers for Graphics Applications,” Proceedings: Second Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, (ASPLOS II), IEEE, 1987, pp. 193-198.
- Wang, et al., “A Processor Architecture for 3D Graphics Calculations,” Computer Motion, Inc., Goleta, CA, 23 pgs.
- U.S. Appl. No. 13/732,243, filed Dec. 31, 2012.
- U.S. Appl. No. 14/283,020, filed May 20, 2014.
- Craig Hansen, “Architecture of a Broadband Mediaprocessor” 1996 IEEE Proceedings of COMPCON '96, pp. 334-340.
Type: Grant
Filed: Mar 21, 2002
Date of Patent: Apr 7, 2015
Assignee: Intel Corporation (Santa Clara, CA)
Inventors: Patrice Roussel (Portland, OR), Srinivas Chennupaty (Portland, OR), Micheal D. Cranford (Hillsboro, OR), Mohammed A. Abdallah (San Jose, CA), James Coke (Shingle Springs, CA), Katherine Kong (Folsom, CA)
Primary Examiner: Eric Coleman
Application Number: 10/104,205
International Classification: G06F 9/315 (20060101); G06F 15/78 (20060101);