Controlling Access To External Vector Data Patents (Class 712/6)
  • Patent number: 11500778
    Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.
    Type: Grant
    Filed: March 9, 2020
    Date of Patent: November 15, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. Jayasena, James Michael O'Connor, Michael Mantor
  • Patent number: 10860316
    Abstract: Aspects for generating a dot product for two vectors in neural network are described herein. The aspects may include a controller unit configured to receive a vector load instruction that includes a first address of a first vector and a length of the first vector. The aspects may further include a direct memory access unit configured to retrieve the first vector from a storage device based on the first address of the first vector. Further still, the aspects may include a caching unit configured to store the first vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: December 8, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Tian Zhi, Qi Guo, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10614504
    Abstract: Systems, apparatuses, and methods are provided herein for content-based product recommendations. A system for content-based product recommendations comprises a content monitoring device configured to monitor video content viewed by a user, a customer vectors database, a product vectors database; and a control circuit being configured to: detect, via the content monitoring device, a video content being viewed by the user, identify an item associated with a current segment of the video content viewed by the user, determine a product category associated with the item, determine alignments between the customer value vectors and the product characteristic vectors for each of the plurality of products in the product category, select a recommended product from the plurality of products based on the alignments between the customer value vectors and the product characteristic vectors for each of the plurality of products, and initiate an offer of the recommended product to the customer.
    Type: Grant
    Filed: April 14, 2017
    Date of Patent: April 7, 2020
    Assignee: Walmart Apollo, LLC
    Inventors: Bruce W. Wilkinson, Brian G. McHale, Todd D. Mattingly
  • Patent number: 10049082
    Abstract: Systems and methods for calculating a dot product using digital signal processing units that are organized into a dot product processing unit for dot product processing using multipliers and adders of the digital signal processing units.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: August 14, 2018
    Assignee: ALTERA CORPORATION
    Inventors: Andrew Chaang Ling, Davor Capalija, Tomasz Sebastian Czajkowski, Andrei Mihai Hagiescu Miriste
  • Patent number: 9921837
    Abstract: A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction identifies an input vector operand whose input elements specify one or the other of two states. The instruction execution pipeline also includes an instruction decoder to decode the instruction. The instruction execution pipeline also includes a functional unit to execute the instruction and provide a resultant output vector. The functional unit includes logic circuitry to produce an element in a specific element position of the resultant output vector by performing an operation on a value derived from a base value using a stride in response to one but not the other of the two states being present in a corresponding element position of the input vector operand.
    Type: Grant
    Filed: July 19, 2016
    Date of Patent: March 20, 2018
    Assignee: INTEL CORPORATION
    Inventor: Mikhail Plotnikov
  • Patent number: 9910650
    Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: March 6, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
  • Patent number: 9785436
    Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: October 10, 2017
    Assignee: INTEL CORPORATION
    Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
  • Patent number: 9767036
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Grant
    Filed: October 16, 2013
    Date of Patent: September 19, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove, Chenghuan Jia, John Mashey
  • Patent number: 9575755
    Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.
    Type: Grant
    Filed: August 3, 2012
    Date of Patent: February 21, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
  • Patent number: 9535694
    Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.
    Type: Grant
    Filed: August 8, 2012
    Date of Patent: January 3, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
  • Patent number: 9021233
    Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by elements of earlier and a later vector instructions, one being a write instruction. An element indicating the next data access for each of the instructions is determined. The next data accesses for the earlier and the later instructions may be reordered. The next data access of the earlier instruction is selected if the position of the earlier instruction's next data element is less than or equal to the position of the later instruction's next data element minus a predetermined value. The next data access of the later instruction may be selected if the position of the earlier instruction's next data element is higher than the position of the later instruction's next data element minus a predetermined value. Thus data accesses from earlier and later instructions are partially interleaved.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 28, 2015
    Assignee: ARM Limited
    Inventor: Alastair David Reid
  • Publication number: 20150100754
    Abstract: A data processing apparatus and method for performing speculative vector access operations are provided. The data processing apparatus has a reconfigurable buffer accessible to vector data access circuitry and comprising a storage array for storing up to M vectors of N vectors elements. The vector data access circuitry performs speculative data write operations in order to cause vector elements from selected vector operands in a vector register bank to be stored into the reconfigurable buffer. On occurrence of a commit condition, the vector elements currently stored in the reconfigurable buffer are then written to a data store. Speculation control circuitry maintains a speculation width indication indicating the number of vector elements of each selected vector operand stored in the reconfigurable buffer.
    Type: Application
    Filed: August 18, 2014
    Publication date: April 9, 2015
    Inventors: Alastair David REID, Daniel KERSHAW
  • Patent number: 8892848
    Abstract: A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
    Type: Grant
    Filed: July 5, 2011
    Date of Patent: November 18, 2014
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Tom Forsyth, Michael Abrash
  • Patent number: 8850167
    Abstract: Provided is a processor including an instruction issue unit that issues a vector load instruction read from a main memory based on branch target prediction of a branch target in a branch instruction, a data acquisition unit that starts issue of a plurality of acquisition requests for acquiring a plurality of vector data based on the issued vector load instruction from the main memory, a determination unit that determines a success or a failure of the branch target prediction after the branch target is determined, and a vector load management unit that, when the branch target prediction is determined to be a success, acquires all vector data based on the plurality of acquisition requests and then transfers all the vector data to a vector register, and, when the branch target prediction is determined to be a failure, discards the vector data acquired by the issued acquisition requests.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: September 30, 2014
    Assignee: NEC Corporation
    Inventor: Masao Fukagawa
  • Patent number: 8826252
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: September 2, 2014
    Assignee: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Patent number: 8560808
    Abstract: A method, system and computer readable media for dynamically updating current communication information, for enabling access to current communication based upon biometric information and/or for allowing communication information to be associated with biometric information and then allowing this communication information to be provided to desired recipients.
    Type: Grant
    Filed: January 3, 2012
    Date of Patent: October 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sarbajit Kumar Rakshit, Shawn K. Sremaniak, Thomas S. Mazzeo, Barry Allan Kritt
  • Patent number: 7984273
    Abstract: A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: July 19, 2011
    Assignees: Intel Corporation
    Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Tom Forsyth, Michael Abrash
  • Patent number: 7895379
    Abstract: Control logic of a node controller receives an input vector and produces an output vector. The control logic includes a plurality of tied control store entries including hard-coded logic to identify unique values of the input vector and to produce the output vector from a hard-coded output vector when the input vector is identified and when the tied control store is enabled. The control logic also includes a plurality of spare control store entries including programmable logic configurable to identify values of the input vector and to produce the output vector from a programmable output vector when the input vector is identified and when the spare control store is enabled. One of the spare control store entries that is configured to identify a value of the input vector that none of the tied control store entries that are enabled by the entry-enables register are configured to identify is enabled.
    Type: Grant
    Filed: December 23, 2008
    Date of Patent: February 22, 2011
    Assignee: Unisys Corporation
    Inventors: Ross M. Weber, David R. Spatafore
  • Patent number: 7783860
    Abstract: Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: David Arnold Luick, Eric Oliver Mejdrich, Adam James Muff
  • Patent number: 7743223
    Abstract: In a computer system having a plurality of processors connected to a shared memory, a system and method of decoupling an address from write data in a store to the shared memory. A write request address is generated for a memory write, wherein the write request address points to a memory location in shared memory. A write request is issued to the shared memory, wherein the write request includes the write request address. The write request address is noted in the shared memory and addresses in subsequent load and store requests are compared in share memory to the write request address. The write data is transferred to the shared memory and matched, within the shared memory, to the write request address. The write data is then stored into the shared memory as a function of the write request address.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: June 22, 2010
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes
  • Patent number: 7743231
    Abstract: Provided are a method, information processing system, and computer readable medium for identifying active bits in a vector. The method comprises receiving a pointer associated with a vector of bits. The pointer is associated with a current bit within the vector of bits. The vector of bits if grouped into groups of a mathematical power of two, which is any non-negative integer powers of two. One or more current groups are determined which are the groups of the mathematical power of two comprising the current bit. The one or more current groups of the power of two are analyzed. A largest group of the power of two is identified in the one or more current groups comprising all empty bits. The pointer is set to point to a bit following a last bit in the identified largest group of the power of two comprising all empty bits.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: June 22, 2010
    Assignee: International Business Machines Corporation
    Inventors: Scot H. Rider, Todd A. Strader
  • Patent number: 7627744
    Abstract: An integrated circuit comprises an external memory, a plurality of parallel connected Vector Processing Engines (VPEs), and an External Memory Unit (EMU) providing a data transfer path between the VPEs and the external memory. Each VPE contains a plurality of data processing units and a message queuing system adapted to transfer messages between the data processing units and other components of the integrated circuit.
    Type: Grant
    Filed: May 10, 2007
    Date of Patent: December 1, 2009
    Assignee: NVIDIA Corporation
    Inventors: Monier Maher, Jean Pierre Bordes, Christopher Lamb, Sanjay J. Patel
  • Patent number: 7519800
    Abstract: A heterogeneous computer system has multiple interconnected cells, each cell has multiple primary processors of the same Instruction Set Architecture (ISA) type, but different cells may have processors of different ISA types. Each cell has a cell type register readable by a processor external to the cell. The cell type register of each cell is used at system startup time to ensure that all processors of a system partition have compatible ISA types.
    Type: Grant
    Filed: March 27, 2003
    Date of Patent: April 14, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Scott Lynn Michaelis
  • Patent number: 7480797
    Abstract: Various embodiments of the present invention introduce privilege-level mapping into a computer architecture not initially designed for supporting virtualization. Privilege-level mapping can, with relatively minor changes to processor logic, fully prevent privileged-level-information leaks by which non-privilege code can determine the current machine-level privilege level at which they are executing. In one embodiment of the present invention, a new privilege-level mapping register is introduced, and privilege-level mapping is enabled for all but code invoked by privileged-level-0-forcing hardware events.
    Type: Grant
    Filed: July 31, 2004
    Date of Patent: January 20, 2009
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Bret McKee
  • Patent number: 7404065
    Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: July 22, 2008
    Assignee: Intel Corporation
    Inventors: Stephan Jourdan, Per Hammarlund, Michael Fetterman, Michael P. Cornaby, Glenn Hinton, Avinash Sodani
  • Publication number: 20080016317
    Abstract: A data cache memory coupled to a processor including processor clusters are adapted to operate simultaneously on scalar and vectorial data by providing data locations in the data cache memory for storing data for processing. The data locations are accessed either in a scalar mode or in a vectorial mode. This is done by explicitly mapping the data locations that are scalar and the data locations that are vectorial.
    Type: Application
    Filed: June 26, 2007
    Publication date: January 17, 2008
    Applicants: STMicroelectronics S.r.l., STMicroelectronics N.V.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
  • Patent number: 7219212
    Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.
    Type: Grant
    Filed: February 25, 2005
    Date of Patent: May 15, 2007
    Assignee: Tensilica, Inc.
    Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
  • Patent number: 7100019
    Abstract: A system and method for calculating memory addresses in a partitioned memory in a processing system having a processing unit, input and output units, a program sequencer and an external interface. An address calculator includes a set of storage elements, such as registers, and an arithmetic unit for calculating a memory address of a vector element dependent upon values stored in the storage elements and the address of a previous vector element. The storage elements hold STRIDE, SKIP and SPAN values and optionally a TYPE value, relating to the spacing between elements in the same partition, the spacing between elements in the consecutive partitions, the number of elements in a partition and the size of a vector element, respectively.
    Type: Grant
    Filed: September 8, 2003
    Date of Patent: August 29, 2006
    Assignee: Motorola, Inc.
    Inventors: James M. Norris, Philip E. May, Kent D. Moat, Raymond B. Essick, IV, Brian G. Lucas
  • Patent number: 7069557
    Abstract: A virtual path feature in which several virtual channels share an assigned amount of bandwidth is implemented in a network processor. The network processor maintains a schedule indicative of respective times at which a plurality of virtual channels are to be serviced. An entry is read from the schedule. The entry corresponds to a current transmit cycle and includes a pointer to a channel descriptor for a virtual channel to be serviced in the current transmit cycle. A data cell for the virtual channel to be serviced in the current cycle is transmitted. An entry is added to the schedule to point to a channel descriptor that is pointed to by the channel descriptor for the virtual channel serviced in the current transmit cycle.
    Type: Grant
    Filed: May 23, 2002
    Date of Patent: June 27, 2006
    Assignee: International Business Machines Corporation
    Inventor: Merwin Herscher Alferness
  • Patent number: 7043607
    Abstract: The vector unit 21 outputs a first flash address to the flash address array 24. The vector unit 31 outputs a second flash address to the flash address array 34. In the master unit 2, the flash address array 24 compares an address registered in a cache with the first flash address. In the slave unit 3, the flash address array 34 compares the address registered in the cache with the second flash address. When said first flash address coincides with said address registered in said cache, the flash address array 24 sends a first coincidence address to the address array 25. When said second flash address coincides with said address registered in said cache, the flash address array 34 sends a second coincidence address to the address array 25. A corresponding address of the address array 25 is flashed based on the first address sent from the flash address array 24 and based on the second address sent from the flash address 34.
    Type: Grant
    Filed: June 12, 2003
    Date of Patent: May 9, 2006
    Assignee: NEC Corporation
    Inventor: Kenji Ezoe
  • Patent number: 6963341
    Abstract: The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: November 8, 2005
    Inventor: Tibet Mimar
  • Patent number: 6957324
    Abstract: A vector computer system includes a plurality of memory banks 40, a vector processor 11, and a plurality of additional processing units 30 each of which is connected to one of the memory banks 40. Each of the additional processing units 30 reads data from the corresponding memory bank 40 by referring to an address designated by the processor 11, and performs a designated operation about the data. Then the additional processing unit 30 stores the result of the operation into the designated address.
    Type: Grant
    Filed: September 28, 2001
    Date of Patent: October 18, 2005
    Assignee: NEC Corporation
    Inventor: Takumi Washio
  • Patent number: 6865517
    Abstract: A method, apparatus and computer product that enables a processor associated with a node in a computer system having various nodes, the nodes having sensors which provide data, and the nodes being connected by a communications facility acquiring local data from the sensor and remote data from other nodes via the data transfer facility. The nodes process data from a local sensor at the node and from remote sensors at other nodes; and analyze the local data, data from other nodes and local decisions made at and received from other nodes to make a local decision for action at the node. A local decision made at a node is in turn communicated to other nodes.
    Type: Grant
    Filed: December 11, 2002
    Date of Patent: March 8, 2005
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, John S. Davis, II, Rafah A. Hosn, Nicholas M. Mitchell, Veronique Perret, Daby M. Sow, Jeremy B. Sussman
  • Patent number: 6816960
    Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.
    Type: Grant
    Filed: July 10, 2001
    Date of Patent: November 9, 2004
    Assignee: NEC Corporation
    Inventor: Hisao Koyanagi
  • Patent number: 6813701
    Abstract: A compiler and vector data transfer instructions for use in a vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. The compiler identifies the use of vector data in an application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. A vector is partitioned by the compiler into variable-sized streams which are transferred into and out of the processor as burst transactions. The compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers and each vector buffer is used at a specific time.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: November 2, 2004
    Assignee: NEC Electronics America, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6742106
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.
    Type: Grant
    Filed: January 28, 2003
    Date of Patent: May 25, 2004
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6665749
    Abstract: The present invention provides a bus architecture for a data processing system that improves transfers of vector data using a vector transfer unit (VTU). An external bus is coupled between the vector transfer unit and the memory. The external bus includes a system command bus that is used to transmit a data transfer command. The command is based on a corresponding vector transfer instruction in the application program, such as load vector data or store vector data. The commands for transferring the data elements include a burst read command and a burst write command. A variable number of data elements may be transferred, according to the user's requirements. The system command bus is also capable of transmitting a packing ratio that indicates the number of data elements that fit in the width of the external bus. This allows the entire bandwidth of the external bus to be used during vector data transfers.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: December 16, 2003
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6625720
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector instructions are used for transferring the vector data between memory and registers used to perform calculations on the vector data. The transfers of portions of the vector data required in a calculation are scheduled so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers based on configuration information including the number of vectors buffers required by an application program and the size required for each vector buffer. The vector buffers are allocated for exclusive use by an application program that is executing in the data processor. Vector data transfer instructions are posted in a vector transfer instruction queue and are executed in the order they are posted to the instruction queue.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: September 23, 2003
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6571386
    Abstract: An optimizer (100) comprises a memory (110) and a processor (130). The memory stores a program (200) to be optimized and optimization software (301). Controlled by the optimization software, the processor (120) (a) determines local vectors (“local”) in instructions of the program (200) which indicate the use of resources by the instructions (use-vectors, exh-vectors); (b) scans the program (200) for Single-Entry-Single-Exit (SESE) structures (U, T, V, S); and (c) determines SESE vectors from the local vectors. The SESE vectors indicate the use of resources by the SESE structures and can be combined by the optimizer to obtain a program vector. When some instructions are modified, then optimizer (100) only re-calculates the SESE vector of the corresponding SESE and re-combines the old SESE vector with the modified SESE vector to determine a new program vector.
    Type: Grant
    Filed: March 13, 2000
    Date of Patent: May 27, 2003
    Assignee: Motorola, Inc.
    Inventors: Mikhail Figurin, Mikhail Okrugin, Dmitriy Barmenkov
  • Patent number: 6513107
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: January 28, 2003
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6484220
    Abstract: A method for transferring data between devices in a computer system. In a preferred embodiment, a requesting device broadcasts a request for data. Each of a plurality of devices within the computer system responds to the request and indicates the location of the device and whether the device contains the requested data. The data is then transferred to the requesting device from one of the devices containing the data within the plurality of devices to the requesting device. The device selected to transfer the data to the requesting device has the closest logical proximity to the requesting device which results in a quick transfer of data.
    Type: Grant
    Filed: August 26, 1999
    Date of Patent: November 19, 2002
    Assignee: International Business Machines Corporation
    Inventors: Manuel Joseph Alvarez, II, Sanjay Raghunath Deshpande, Kenneth Douglas Klapproth, David Mui
  • Publication number: 20020007449
    Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.
    Type: Application
    Filed: July 10, 2001
    Publication date: January 17, 2002
    Applicant: NEC CORPORATION
    Inventor: Hisao Koyanagi
  • Patent number: 6336179
    Abstract: A first counter sequentially counts a plurality of numbers from respective sources requesting transfer of data. Each of the numbers represents an amount of isochronous data to transfer over the bus from the respective ones of the sources during a frame on a bus. A count value in a second counter is selectably incremented when the first counter is counting, to provide a remaining count value indicative a remaining amount of data to transfer during the frame. The remaining count value in the second counter is decremented for each isochronous transfer on the bus after the remaining amount of data to transfer has been determined from all sources requesting transfer of isochronous data during the frame. A third counter tracks the time remaining in the frame and compares the remaining count value to the time remaining in the frame to determine a priority mode on the bus.
    Type: Grant
    Filed: August 21, 1998
    Date of Patent: January 1, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Dale E. Gulick
  • Patent number: 6324611
    Abstract: A physical layer interface for a serial bus includes a controller for producing parallel data representing a near-end line state of the serial bus. A line transmitter is connected to the controller for converting the parallel data therefrom into serial data and transmitting the serial data to the serial bus. A line receiver is connected to the serial bus for receiving therefrom serial dtaa and converting the received serial data into parallel data representing a far-end line state of the serial bus. A differential line state of the serial bus is detected from the parallel data of the controller and the parallel data of the line receiver. The detected differential line state is the input to the controller. In a modified embodiment, a far-end line state of the serial bus is detected from the near-end line state of the serial bus and a far-end differential signal received by the line receiver and directly supplied to the controller.
    Type: Grant
    Filed: September 17, 1998
    Date of Patent: November 27, 2001
    Assignee: NEC Corporation
    Inventor: Takayuki Nyu
  • Patent number: 6308250
    Abstract: A method and system for operating a computing system having multiple processing units. According to a new machine instruction, called the iota instruction, the computing system operates on a vector of mask bits to generate an iota vector having a sequence of values. In one form, each value of the iota vector is a sum of a series of the lower order mask bits up to and including the mask bit corresponding to the entry in the iota vector. In another form, each entry in the iota vector is a sum of a series of lower order mask bits but does not include the mask bit corresponding to the particular entry in the iota vector. In order to calculate the iota vector, the multiple processing units of the present invention communicate the mask bits to the other processing units. Advantages of the present invention include the vectorization of software loops having certain data hazards that prevented conventional compilers from vectorizing the software.
    Type: Grant
    Filed: June 23, 1998
    Date of Patent: October 23, 2001
    Assignee: Silicon Graphics, Inc.
    Inventor: Peter Michael Klausler
  • Patent number: 6253304
    Abstract: A first and a second local interrupt controller are disposed on a single integrated circuit. The first and second local interrupt controllers are coupled to controllably provide at least one interrupt request signal, respectively, to a first and second processor. An input/output (I/O) interrupt controller is also on the integrated circuit and coupled to receive an interrupt request from at least one input/output device. A communication circuit on the integrated circuit is coupled to the input/output interrupt controller and the first and second local interrupt controllers. The communication circuit provides for transfer of interrupt information between the first local interrupt controller, the second local interrupt controller and the input/output interrupt controller.
    Type: Grant
    Filed: January 4, 1999
    Date of Patent: June 26, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Larry Hewitt, David Neal Suggs, Greg Smaus, Derrick R. Meyer
  • Patent number: 6202130
    Abstract: A data processing system includes a data processor (10) coupled to a memory system having a first memory, such as an L1 data cache (16), arranged with a second memory (such as an L2 cache) at a lower hierarchical level. The data processor (10) prefetches data elements of a vector into the first memory prior to processing such data elements. If a requested data element is not present in the first memory, a load request is issued to the second memory and to lower levels of the memory hierarchy until the requested data element is finally retrieved and stored in the first memory. The data processor (10) continues to prefetch subsequent data elements of the vector by considering the length of the data element and the stride of the vector. In one embodiment, the data processor (10) prefetches the vector into the first memory in response to a single data stream touch load (DST) instruction (100).
    Type: Grant
    Filed: April 17, 1998
    Date of Patent: March 13, 2001
    Assignee: Motorola, Inc.
    Inventors: Hunter Ledbetter Scales, III, Keith Everett Diefendorff, Brett Olsson, Pradeep Kumar Dubey, Ronald Ray Hochsprung, Bradford Byron Beavers, Bradley G. Burgess, Michael Dean Snyder, Cathy May, Edward John Silha
  • Patent number: 6141673
    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU.
    Type: Grant
    Filed: May 25, 1999
    Date of Patent: October 31, 2000
    Assignees: Advanced Micro Devices, Inc., Compaq Computer Corp.
    Inventors: John S. Thayer, John Gregory Favor, Frederick D. Weber
  • Patent number: 6009505
    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.
    Type: Grant
    Filed: December 2, 1996
    Date of Patent: December 28, 1999
    Assignees: Compaq Computer Corp., Advanced Micro Devices, Inc.
    Inventors: John S. Thayer, Gary W. Thome, Brian E. Longhenry, John G. Favor, Frederick D. Weber