Array Processor Element Interconnection Patents (Class 712/11)
-
Patent number: 10199364Abstract: A single multichip package is provided, comprising: a substrate having opposing upper and lower surfaces. A first die is mounted on the upper surface of the substrate and includes one or more non-volatile memory devices. A second die is mounted on the upper surface of the substrate, and includes at least one of: (a) a non-volatile memory controller that facilitates transfer of data to/from the one or more non-volatile memory devices, (b) a register clock driver for volatile memory devices, and/or (c) one or more multiplexer switches configured to switch between two or more of the volatile memory devices. A plurality of wire bonds connect the first and second dies. A plurality of solder balls are located on the lower surface of the substrate for mounting the single multichip package to a printed circuit board, the plurality of solder balls electrically coupled to the first die and the second die.Type: GrantFiled: May 19, 2017Date of Patent: February 5, 2019Assignee: SANMINA CORPORATIONInventors: Arvindhkumar Lalam, Alec C. Shen
-
Patent number: 10009135Abstract: According to some embodiments, a network architecture is disclosed. The network architecture includes a plurality of processing network nodes. The network architecture further includes at least one broadcasting medium to interconnect the plurality of processing network nodes where the broadcasting medium includes an integrated waveguide. The network architecture also includes a broadcast and weight protocol configured to perform wavelength division multiplexing such that multiple wavelengths coexist in the integrated waveguide available to all nodes of the plurality of processing network nodes.Type: GrantFiled: February 5, 2016Date of Patent: June 26, 2018Assignee: THE TRUSTEES OF PRINCETON UNIVERSITYInventors: Alexander N. Tait, Mitchell A. Nahmias, Bhavin J. Shastri, Paul R. Prucnal
-
Patent number: 9992133Abstract: A switching device in a network system for transferring data includes one or more source line cards, one or more destination line cards and a switching fabric coupled to the source line cards and the destination line cards to enable data communication between any source line card and destination line card. Each source line card includes a request generator to generate a request signal to be transmitted in order to obtain an authorization to transmit data. Each destination line card includes a grant generator to generate and send back a grant signal to the source line card in response to the request signal received at the destination line card to authorize the source line card to transmit a data cell to the destination line card.Type: GrantFiled: October 21, 2016Date of Patent: June 5, 2018Assignee: Juniper Networks, Inc.Inventors: Pradeep S. Sindhu, Philippe Lacroute, Matthew A. Tucker, John D. Weisbloom, David B. Winters
-
Patent number: 9984026Abstract: Provided is a parallel computing system that has scalability and is capable of performing data transfer between desired PEs. Also provided is a computer system that utilizes the parallel computing system described above, and enables radiosity processing on small-scale mobile terminal devices. An HXNet is implemented in a VLSI, and data transfer between VLSIs is possible using additional BMs. Scalability is realized that enables selection of any number of VLSIs, and radiosity processing is enabled on small-scale mobile terminal devices.Type: GrantFiled: May 11, 2015Date of Patent: May 29, 2018Assignee: Nakaikegami Koubou Co., Ltd.Inventor: Ryuji Murakami
-
Patent number: 9971720Abstract: An island-based integrated circuit includes a configurable mesh data bus. The data bus includes four meshes. Each mesh includes, for each island, a crossbar switch and radiating half links. The half links of adjacent islands align to form links between crossbar switches. A link is implemented as two distributed credit FIFOs. In one direction, a link portion involves a FIFO associated with an output port of a first island, a first chain of registers, and a second FIFO associated with an input port of a second island. When a transaction value passes through the FIFO and through the crossbar switch of the second island, an arbiter in the crossbar switch returns a taken signal. The taken signal passes back through a second chain of registers to a credit count circuit in the first island. The credit count circuit maintains a credit count value for the distributed credit FIFO.Type: GrantFiled: May 29, 2015Date of Patent: May 15, 2018Assignee: Netronome Systems, Inc.Inventors: Gavin J. Stark, Steven W. Zagorianakos, Ronald N. Fortino
-
Patent number: 9875171Abstract: A technique for estimating a format of a log message (LM) according to the present invention includes creating a first directed graph structure by dividing a first LM by predetermined characters to define divided portions as nodes and arranging the nodes in order from the beginning of the first LM; creating a second directed graph structure by performing on a second LM the same processing as that performed on the first LM; comparing nodes in the first directed graph structure with nodes in the second directed graph structure to detect nodes other than nodes including a corresponding character string; adding to the first directed graph structure the node detected in the second directed graph structure among the detected nodes as a first branch node; and estimating the format, based on the first directed graph structure including the first branch node added thereto.Type: GrantFiled: August 27, 2015Date of Patent: January 23, 2018Assignee: International Business Machines CorporationInventor: Masayoshi Mizutani
-
Patent number: 9875045Abstract: A device for matching, in input data, a regular expression with back-references, represented by a finite-state machine (FSM). The device comprises a plurality of parallel processing elements (PPEs), an interconnection network for interconnecting the PPEs with each other, and a memory for receiving and storing input data. The PPEs process the input data stored in the memory, based on backtracking to process the back-references, and implement FA next state logic to generate new active FA configurations or mark themselves as available to receive active FA configurations. The interconnection network retrieves active FA configurations from the PPEs and allocates the active FA configurations to available PPEs. The PPEs are configured to match a regular expression in the input data.Type: GrantFiled: July 27, 2015Date of Patent: January 23, 2018Assignee: International Business Machines CorporationInventors: Kubilay Atasu, Silvio Dragone
-
Patent number: 9778856Abstract: The subject disclosure is directed towards one or more parallel storage components for parallelizing block-level input/output associated with remote file data. Based upon a mapping scheme, the file data is partitioned into a plurality of blocks in which each may be equal in size. A translator component of the parallel storage may determine a mapping between the plurality of blocks and a plurality of storage nodes such that at least a portion of the plurality of blocks is accessible in parallel. Such a mapping, for example, may place each block in a different storage node allowing the plurality of blocks to be retrieved simultaneously and in its entirety.Type: GrantFiled: August 30, 2012Date of Patent: October 3, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Bin Fan, Asim Kadav, Edmund Bernard Nightingale, Jeremy E. Elson, Richard F. Rashid, James W. Mickens
-
Patent number: 9780978Abstract: A system for an orthogonal frequency division multiplexed (OFDM) equalizer, said system comprising a program memory, a program sequencer and a processing unit connected to each other, wherein the processing unit comprises an input selection unit, an arithmetic logic unit (ALU), a coprocessor and an output selection unit; further wherein the program sequencer schedules the processing of one or more symbol-carrier pairs input to said OFDM equalizer using multiple threads; retrieves, for each of the one or more symbol-carrier pairs, multiple program instructions from said program memory; generates multiple expanded instructions corresponding to said retrieved multiple program instructions; and further wherein said ALU performs said processing of the one or more symbol-carrier pairs using the multiple threads across multiple pipeline stages, wherein said processing comprises said ALU executing arithmetic operations to process said expanded instructions using said multiple threads across the multiple pipeline stagType: GrantFiled: December 15, 2016Date of Patent: October 3, 2017Assignee: Redline Communications Inc.Inventor: Octavian Valeriu Sarca
-
Patent number: 9710469Abstract: Methods and systems for providing content are disclosed. An example method can comprise identifying a first plurality of data fragments of a media file. An example method can also comprise identifying a second plurality of data fragments of the media file. An example method can comprise generating a manifest file. The manifest file can comprise information for playback of the second plurality of data fragments on a device without access to the first plurality of data fragments.Type: GrantFiled: March 15, 2013Date of Patent: July 18, 2017Assignee: Comcast Cable Communications, LLCInventor: Michael Chen
-
Patent number: 9690734Abstract: A plurality of data links interconnects a number (N) of nodes of a large-scale, parallel system with minimum data transfer latency. A maximum number (K) of the data links connect each node to the other nodes. The number (N) of the nodes is related to the maximum number (K) of the data links by the expression: N=2K. An average distance (A) of the shortest distances between all pairs of the nodes, and a diameter (D), which is a largest of the shortest distances, are minimized.Type: GrantFiled: September 8, 2015Date of Patent: June 27, 2017Inventor: Arjun Kapoor
-
Patent number: 9575756Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: GrantFiled: August 8, 2012Date of Patent: February 21, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9569211Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: GrantFiled: August 3, 2012Date of Patent: February 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9531781Abstract: In a stream computing application, data may be transmitted between operators using tuples. However, the receiving operator may not evaluate these tuples as they arrive but instead wait to evaluate a group of tuples—i.e., a window. A window is typically triggered when a buffer associated with the receiving operator reaches a maximum window size or when a predetermined time period has expired. Additionally, a window may be triggered by a monitoring a tuple rate—i.e., the rate at which the operator receives the tuples. If the tuple rate exceeds or falls below a threshold, a window may be triggered. Further, the number of exceptions, or the rate at which an operator throws exceptions, may be monitored. If either of these parameters satisfies a threshold, a window may be triggered, thereby instructing an operator to evaluate the tuples contained within the window.Type: GrantFiled: December 10, 2012Date of Patent: December 27, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael J. Branson, John M. Santosuosso, Brandon W. Schulz
-
Patent number: 9524178Abstract: Systems and methods for executing non-native instructions in a computing system having a processor configured to execute native instructions are provided. A dynamic translator uses instruction code translation in parallel with just-in-time (JIT) compilation to execute the non-native instructions. Non-native instructions may be interpreted to generate instruction codes, which may be stored in a shadow memory. During a subsequent scheduling of a non-native instruction for execution, the corresponding instruction code may be retrieved from the shadow memory and executed, thereby avoiding reinterpreting the non-native instruction. In addition, the JIT compiler may compile instruction codes to generate native instructions, which may be made available for execution, further speeding up the execution process.Type: GrantFiled: December 30, 2013Date of Patent: December 20, 2016Assignee: Unisys CorporationInventors: Andrew T Jennings, Charles R Caldarale, Maurice Marks, Kevin Harris
-
Patent number: 9513963Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.Type: GrantFiled: December 17, 2014Date of Patent: December 6, 2016Assignee: Imagination Technologies LimitedInventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
-
Patent number: 9509780Abstract: A node includes a sending unit that sends a signal to another node; a receiving unit that receives a signal from another node; a determining unit that determines, when the sending unit sends a signal to the other node, that synchronization has been established with the other node, that determines, when the receiving unit receives a signal from another node, that synchronization has been established with the other node, and that determines, when a node in which synchronization has already been established with the other two nodes in each of which synchronization has been established, that synchronization has been established with the nodes; and a selecting unit that selects an information processing apparatus that is not determined, by the determining unit, that synchronization has been established as the other node at the sending destination for the signal.Type: GrantFiled: July 28, 2014Date of Patent: November 29, 2016Assignee: FUJITSU LIMITEDInventors: Naoto Fukumoto, Akira Naruse, Kohta Nakashima
-
Patent number: 9507394Abstract: A monolithically integrated circuit with one or more supply overrides without need of an override control pin to the IC is presented. The internal circuitry to control such an override is presented and various override conditions are also presented.Type: GrantFiled: March 29, 2013Date of Patent: November 29, 2016Assignee: Peregrine Semiconductor CorporationInventor: Robert Mark Englekirk
-
Patent number: 9465675Abstract: An arithmetic processing device executes a program, and gives first sequence information to a first start time when a first process included in the program starts a first interprocess communication. Then, the first start time and the first sequence information are written in a main storage device. When second sequence information given to a second start time when a second process starts a second interprocess communication is newer than the first sequence information, an operational circuit in a communication control device does not carry out an operation using the first start time. On the other hand, when the second sequence information corresponds to the first sequence information, the operational circuit carries out an operation using the first start time and the second start time and outputs the operation result.Type: GrantFiled: November 12, 2014Date of Patent: October 11, 2016Assignee: FUJITSU LIMITEDInventors: Hideki Miwa, Ikuo Miyoshi
-
Patent number: 9405724Abstract: A reconfigurable tree apparatus with a bypass mode and a method of using the reconfigurable tree apparatus are disclosed. The reconfigurable tree apparatus uses a short-circuit register to selectively designate participating agents for such operations as barriers, multicast, and reductions. The reconfigurable tree apparatus enables an agent to initiate a barrier, multicast, or reduction operation, leaving software to determine the participating agents for each operation. Although the reconfigurable tree apparatus is implemented using a small number of wires, multiple in-flight barrier, multicast, and reduction operations can take place. The method and apparatus have low complexity, easy reconfigurability, and provide the energy savings necessary for future exa-scale machines.Type: GrantFiled: June 28, 2013Date of Patent: August 2, 2016Assignee: INTEL CORPORATIONInventors: Jianping Xu, Asit K. Mishra, Joshua B. Fryman, David S. Dunning
-
Patent number: 9405647Abstract: Some implementations provide techniques and arrangements for detecting a register value having a life longer than a threshold period based, at least in part, on at least one code segment of a code being translated by a binary translator. For a register value detected as having a life longer than a threshold period, at least one instruction to cause an access of the detected register value during the life of the register value may be included in at least one translated code segment to be output by the binary translator.Type: GrantFiled: December 30, 2011Date of Patent: August 2, 2016Assignee: Intel CorporationInventors: Xavier Vera, Javier Carretero Casado, Matteo Monchiero, Tanausu Ramirez, Enric Herrero
-
Patent number: 9390057Abstract: An array processor includes processing elements arranged in clusters to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.Type: GrantFiled: September 14, 2012Date of Patent: July 12, 2016Assignee: Altera CorporationInventors: Gerald George Pechanek, Charles W. Kurak, Jr.
-
Patent number: 9368489Abstract: Embodiments of the invention relate to processor arrays, and in particular, a processor array with interconnect circuits for bonding semiconductor dies. One embodiment comprises multiple semiconductor dies and at least one interconnect circuit for exchanging signals between the dies. Each die comprises at least one processor core circuit. Each interconnect circuit corresponds to a die of the processor array. Each interconnect circuit comprises one or more attachment pads for interconnecting a corresponding die with another die, and at least one multiplexor structure configured for exchanging bus signals in a reversed order.Type: GrantFiled: February 28, 2013Date of Patent: June 14, 2016Assignee: International Business Machines CorporationInventors: Rodrigo Alvarez-Icaza Rivera, John V. Arthur, John E. Barth, Andrew S. Cassidy, Subramanian S. Iyer, Bryan L. Jackson, Paul A. Merolla, Dharmendra S. Modha, Jun Sawada
-
Patent number: 9363137Abstract: Embodiments of the invention relate to faulty recovery mechanisms for a three-dimensional (3-D) network on a processor array. One embodiment comprises a multidimensional switch network for a processor array. The switch network comprises multiple switches for routing packets between multiple core circuits of the processor array. The switches are organized into multiple planes. The switch network further comprises a redundant plane including multiple redundant switches. Multiple data paths interconnect the switches. The redundant plane is used to facilitate full operation of the processor array in the event of one or more component failures.Type: GrantFiled: August 6, 2015Date of Patent: June 7, 2016Assignee: International Business Machines CorporationInventors: Rodrigo Alvarez-Icaza Rivera, John V. Arthur, John E. Barth, Jr., Andrew S. Cassidy, Subramanian Iyer, Paul A. Merolla, Dharmendra S. Modha
-
Patent number: 9330060Abstract: A method and device for encoding and decoding video image data. An MPEG decoding and encoding process using data flow pipeline architecture implemented using complete dedicated logic is provided. A plurality of fixed-function data processors are interconnected with at least one pipelined data transmission line. At least one of the fixed-function processors performs a predefined encoding/decoding function upon receiving a set of predefined data from said transmission line. Stages of pipeline are synchronized on data without requiring a central traffic controller. This architecture provides better performance in smaller size, lower power consumption and better usage of memory bandwidth.Type: GrantFiled: April 15, 2004Date of Patent: May 3, 2016Assignee: NVIDIA CORPORATIONInventor: Eric Kwong-Hang Tsang
-
Patent number: 9330230Abstract: Validating a cabling topology in a distributed computing system comprised of cabled nodes connected using data communications cables, each cabled node characterized by cabling dimensions, each cable corresponding to one of the cabling dimensions, includes: receiving a selection from a user of at least one cabled node for topology validation; identifying, for each cabling dimension for each selected cabled node, a shortest cabling path; determining, for each cabling dimension, whether the number of cabled nodes in the shortest cabling path for each selected cabled node match; and if, for each cabling dimension, the number of cabled nodes in the shortest cabling path for each selected cabled node match: selecting, for each cabling dimension, the number of cabled nodes in the shortest cabling path as a representative value for the cabling dimension, calculating a product of the representative values, and determining whether the product equals the number of selected cabled nodes.Type: GrantFiled: April 19, 2007Date of Patent: May 3, 2016Assignee: International Business Machines CorporationInventors: Charles J. Archer, Mark G. Megerian
-
Patent number: 9323716Abstract: A reconfigurable hierarchical computer architecture having N levels, where N is an integer value greater than one, wherein said N levels include a first level including a first computation block including a first data input, a first data output and a plurality of computing nodes interconnected by a first connecting mechanism, each computing node including an input port, a functional unit and an output port, the first connecting mechanism capable of connecting each output port to the input port of each other computing node; and a second level including a second computation block including a second data input, a second data output and a plurality of the first computation blocks interconnected by a second connecting means for selectively connecting the first data output of each of the first computation blocks and the second data input to each of the first data inputs and for selectively connecting each of the first data outputs to the second data output.Type: GrantFiled: July 11, 2014Date of Patent: April 26, 2016Assignee: STMICROELECTRONICS SAInventor: Joël Cambonie
-
Patent number: 9282037Abstract: A multiprocessor computer system comprises a dragonfly processor interconnect network that comprises a plurality of processor nodes and a plurality of routers. The routers are operable to route data by selecting from among a plurality of network paths from a target node to a destination node in the dragonfly network based on one or more routing tables.Type: GrantFiled: November 7, 2011Date of Patent: March 8, 2016Assignee: INTEL CORPORATIONInventors: Mike Parker, Steve Scott, Albert Cheng, Robert Alverson
-
Patent number: 9219832Abstract: A portable handheld device includes an image sensor for capturing an image; and a one-chip microcontroller having integrated therein a CPU for processing a script language and a multi-core processor for processing an image captured by the image sensor. The multi-core processor includes therein multiple processing units connected in parallel by a crossbar switch. Each processing unit includes an arithmetic and logic unit (ALU). Each ALU includes a first register set for accepting data from the first crossbar switch, and a second register set for loading data to the crossbar switch.Type: GrantFiled: September 15, 2012Date of Patent: December 22, 2015Assignee: Google Inc.Inventor: Kia Silverbrook
-
Patent number: 9189447Abstract: Algorithm selection for data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.Type: GrantFiled: October 24, 2012Date of Patent: November 17, 2015Assignee: International Business Machines CorporationInventor: Daniel A. Faraj
-
Patent number: 9158737Abstract: To improve processing efficiency of a SIMD processor that divides two-dimensional data into blocks, each having a width of PE number N, to store the data in a local memory of each of PEs by a lateral direction priority method. When designating a local address of N pieces of data arranged in a row direction from head data whose coordinate values in two-dimensional data are (X,Y) to a PE array 110, the N pieces of data being stored in local memories, a CP 150 broadcasts a local address A1, a local address A2, and a threshold number Z obtained by an address calculation unit. Each of the PEs compares a magnitude relation between the threshold number Z and its own number, and selects one of the local address A1 and the local address A2 according to the comparison result.Type: GrantFiled: July 30, 2012Date of Patent: October 13, 2015Assignee: Renesas Electronics CorporationInventor: Shorin Kyo
-
Patent number: 9135201Abstract: The invention is directed to a switching device (Sij) adapted to connects parts of a computer interconnection network, having N input ports (Ia-Ih) and N output ports (Oa-Oh), the device adapted for routing data packets by means of direct crosspoints (CPxy), the direct crosspoints configured for enabling direct connectivity between each of the N input ports to a subset m<N of the output ports only, in accordance with connectivity needs of the computer interconnection network. Preferably, it further comprises an additional circuitry (L) and additional crosspoints (APx,L, APL,y) configured such that at least some of the input ports of the switching device can be indirectly connected to at least some of the output ports of the switching device, through the additional circuitry. The invention further concerns an interconnection network and a method for routing data.Type: GrantFiled: May 25, 2011Date of Patent: September 15, 2015Assignee: International Business Machines CorporationInventors: Francois Abel, Mitch Gusat, Cyriel Minkenberg
-
Patent number: 9100354Abstract: A system comprises a resource, such as an interconnection, for example, of the Network-on-Chip (NoC) type, having an overall bandwidth available for allocation to a set of initiators that compete for allocation of the overall bandwidth. The system includes a communication arbiter for allocating the overall bandwidth to the initiators according to respective values of bandwidth requested (RBW) by the initiators. A control device (50) is configured to detect the deviation between the value of bandwidth allocated to the initiators and the respective value of requested bandwidth and allocate the overall bandwidth to the initiators in a dynamic way minimizing the mean value of the deviation.Type: GrantFiled: November 30, 2012Date of Patent: August 4, 2015Assignees: STMICROELECTRONICS SRL, STMICROELECTRONICS (GRENOBLE 2) SASInventors: Daniele Mangano, Ignazio Antonino Urzi, Giovanni Strano
-
Patent number: 9088582Abstract: Token-based flow control of messages in a parallel computer, the parallel computer including a plurality of compute nodes, each compute node including one or more computer processors, including: allocating, by a token administration module to a plurality of the computer processors in the parallel computer, a number of data communications tokens; identifying all communicators executing on each computer processor, where each communicator is participating in a distinct parallel operation executing on the parallel computer; allocating, to the communicators, the data communications tokens; determining, by a communicator attempting to send data to the destination, whether the communicator has enough available data communications tokens to send the data to the destination; and responsive to determining that the communicator has enough available data communications tokens to send the data, sending, by the communicator, the data to the destination.Type: GrantFiled: February 13, 2013Date of Patent: July 21, 2015Assignee: International Business Machines CorporationInventors: Charles J. Archer, James E. Carey, Philip J. Sanders, Brian E. Smith
-
Patent number: 9075767Abstract: A synchronization apparatus includes a receiver that receives data from a synchronization apparatus of another node that performs synchronization with its own node from among the plurality of synchronization apparatuses and extracts synchronization information from the received data, a transmitter that transmits the data to the synchronization apparatus of the other node, a receiving state register that stores the extracted synchronization information, a delay unit that delays the received data by a specified period of time, and a controller that stores the extracted synchronization information and synchronization information from its own controller in the reception state register and causes the transmitter to transmit the data to the other node and returns the data to its own node back to its own controller via the delay unit when the extracted synchronization information and the synchronization information from its own controller are stored in the reception state register.Type: GrantFiled: December 13, 2011Date of Patent: July 7, 2015Assignee: FUJITSU LIMITEDInventors: Tomohiro Inoue, Yuichiro Ajima, Shinya Hiramoto
-
Patent number: 9069649Abstract: An island-based integrated circuit includes a configurable mesh data bus. The data bus includes four meshes. Each mesh includes, for each island, a crossbar switch and radiating half links. The half links of adjacent islands align to form links between crossbar switches. A link is implemented as two distributed credit FIFOs. In one direction, a link portion involves a FIFO associated with an output port of a first island, a first chain of registers, and a second FIFO associated with an input port of a second island. When a transaction value passes through the FIFO and through the crossbar switch of the second island, an arbiter in the crossbar switch returns a taken signal. The taken signal passes back through a second chain of registers to a credit count circuit in the first island. The credit count circuit maintains a credit count value for the distributed credit FIFO.Type: GrantFiled: February 17, 2012Date of Patent: June 30, 2015Assignee: NETRONOME SYSTEMS, INCORPORATEDInventors: Gavin J. Stark, Steven W. Zagorianakos, Ronald N. Fortino
-
Patent number: 9064092Abstract: An integrated circuit comprises compute nodes arranged in an array; a torus topology network-on-chip interconnecting the compute nodes; and a network extension unit at each end of each row or column of the array, inserted in a network link between two compute nodes. The extension unit has a normal mode establishing the continuity of the network link between the two corresponding compute nodes, and an extension mode dividing the network link in two independent segments that are accessible from outside the integrated circuit.Type: GrantFiled: August 10, 2012Date of Patent: June 23, 2015Assignee: KALRAYInventor: Michel Harrand
-
Patent number: 9055078Abstract: Token-based flow control of messages in a parallel computer, the parallel computer including a plurality of compute nodes, each compute node including one or more computer processors, including: allocating, by a token administration module to a plurality of the computer processors in the parallel computer, a number of data communications tokens; identifying all communicators executing on each computer processor, where each communicator is participating in a distinct parallel operation executing on the parallel computer; allocating, to the communicators, the data communications tokens; determining, by a communicator attempting to send data to the destination, whether the communicator has enough available data communications tokens to send the data to the destination; and responsive to determining that the communicator has enough available data communications tokens to send the data, sending, by the communicator, the data to the destination.Type: GrantFiled: January 10, 2013Date of Patent: June 9, 2015Assignee: International Business Machines CorporationInventors: Charles J. Archer, James E. Carey, Philip J. Sanders, Brian E. Smith
-
Patent number: 9043802Abstract: Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.Type: GrantFiled: January 8, 2014Date of Patent: May 26, 2015Assignee: NetApp, Inc.Inventors: Gokul Nadathur, Manpreet Singh, Grace Ho
-
Patent number: 9037833Abstract: A High Performance Computing (HPC) node comprises a motherboard, a switch comprising eight or more ports integrated on the motherboard, and at least two processors operable to execute an HPC job, with each processor communicably coupled to the integrated switch and integrated on the motherboard.Type: GrantFiled: December 12, 2012Date of Patent: May 19, 2015Assignee: RAYTHEON COMPANYInventors: James D. Ballew, Gary R. Early
-
Patent number: 9038073Abstract: Efficient data processing apparatus and methods include hardware components which are pre-programmed by software. Each hardware component triggers the other to complete its tasks. After the final pre-programmed hardware task is complete, the hardware component issues a software interrupt.Type: GrantFiled: August 13, 2009Date of Patent: May 19, 2015Assignee: QUALCOMM IncorporatedInventors: Mathias Kohlenz, Irfan Anwar Khan, Sathyanarayan Madhusudan, Shailesh Maheshwari, Srividhya Krishnamoorthy, Sandeep Urgaonkar, Thomas Klingenbrunn, Tim Tynghuei Liou, Idreas Mir
-
Publication number: 20150106589Abstract: A computing platform comprising a small form factor high performance computer for mobile high performance computing is provided. The computing platform comprises using small form factor design with a 64-core microprocessor/co-processor is provided. The small form factor high performance computer may include 64-core microprocessor/co-processors based on the ANNI Stem Cell HPC multicore datacenter chipset cluster of REMTEC.Type: ApplicationFiled: October 16, 2014Publication date: April 16, 2015Inventors: Tommy Xaypanya, Richard E. Malinowski
-
Publication number: 20150039855Abstract: A system for pipelining signal flow graphs by a plurality of shared memory processors organized in a 3D physical arrangement with the memory overlaid on the processor nodes that reduces storage of temporary variables. A group function formed by two or more instructions to specify two or more parts of the group function. A first instruction specifies a first part and specifies control information for a second instruction adjacent to the first instruction or at a pre-specified location relative to the first instruction. The first instruction when executed transfers the control information to a pending register and produces a result which is transferred to an operand input associated with the second instruction. The second instruction specifies a second part of the group function and when executed transfers the control information from the pending register to a second execution unit to adjust the second execution unit's operation on the received operand.Type: ApplicationFiled: August 2, 2014Publication date: February 5, 2015Inventor: Gerald George Pechanek
-
Publication number: 20140359254Abstract: A logic cell array having a number of logic cells and a segmented bus system for logic cell communication, the bus system including different segment lines having shorter and longer segments for connecting two points in order to be able to minimize the number of bus elements traversed between separate communication start and end points.Type: ApplicationFiled: May 28, 2013Publication date: December 4, 2014Applicant: PACT XPP TECHNOLOGIES AGInventors: Martin Vorbach, Frank May, Dirk Reichardt, Frank Lier, Gerd Ehlers, Armin Nückel, Volker Baumgarte, Prashant Rao, Jens Oertel
-
Publication number: 20140359255Abstract: A data processor having a plurality of coarse-grained data processing elements arranged in rows and columns, an interconnect structure comprising both global and direct interconnects, the global interconnects interconnecting the coarse-grained data processing elements globally and the direct interconnects interconnecting adjacent data processing elements.Type: ApplicationFiled: August 19, 2014Publication date: December 4, 2014Applicant: PACT XPP TECHNOLOGIES AGInventors: Martin Vorbach, Alexander Thomas
-
Patent number: 8904148Abstract: There is described a processor architecture, comprising: a plurality of first bus pairs, each first bus pair including a respective first bus running in a first direction (for example, left to right) and a respective second bus running in a second direction opposite to the first direction (for example right to left); a plurality of second bus pairs, each second bus pair including a respective third bus running in a third direction (for example downwards) and a respective fourth bus running in a fourth direction opposite to the third direction (for example upwards), the third and fourth buses intersecting the first and second buses; a plurality of switch matrices, each switch matrix located at an intersection of a first and a second pair of buses; a plurality of elements arranged in an array, each element being arranged to receive data from a respective first or second bus, and transfer data to a respective first or second bus.Type: GrantFiled: July 5, 2011Date of Patent: December 2, 2014Assignee: Intel CorporationInventors: Anthony Peter John Claydon, Anne Patricia Claydon
-
Patent number: 8898432Abstract: Systems and methods for folding a single instruction multiple data (SIMD) array include a newly defined processing element group (PEG) that allows interconnection of PEGs by abutment without requiring a row or column weave pattern. The interconnected PEGs form a SIMD array that is effectively folded at its center along the North-South axis, and may also be folded along the East-West axis. The folding of the array provides for north and south boundaries to be co-located and for east and west boundaries to be co-located. The co-location allows wrap-around connections to be done with a propagation distance reduced effectively to zero.Type: GrantFiled: October 25, 2011Date of Patent: November 25, 2014Assignee: Geo Semiconductor, Inc.Inventor: Woodrow L. Meeker
-
Patent number: 8892806Abstract: An integrated circuit, a memory device, a method of operating an integrated circuit and a method of designing an integrated circuit are provided. An integrated circuit comprises a plurality of logical elements and a bus carrying signals for said plurality of logical elements. The integrated circuit also comprises a routing unit having an input coupled to said bus and a plurality of outputs to route signals received at said input to at least one of said outputs. The integrated circuit also comprises a plurality of lines coupled to said plurality of outputs to conduct said signals from said routing unit to at least one of said plurality of logical elements, wherein at least one of said plurality of lines couples said routing unit to only one of said logical elements.Type: GrantFiled: March 7, 2007Date of Patent: November 18, 2014Assignee: Intel Mobile Communications GmbHInventor: Hans Joachim Janssen
-
Patent number: 8830829Abstract: Disclosed are methods, systems, paradigms and structures for processing data packets in a communication network by a multi-core network processor. The network processor includes a plurality of multi-threaded core processors and special purpose processors for processing the data packets atomically, and in parallel. An ingress module of the network processor stores the incoming data packets in the memory and adds them to an input queue. The network processor processes a data packet by performing a set of network operations on the data packet in a single thread of a core processor. The special purpose processors perform a subset of the set of network operations on the data packet atomically. An egress module retrieves the processed data packets from a plurality of output queues based on a quality of service (QoS) associated with the output queues, and forwards the data packets towards their destination addresses.Type: GrantFiled: December 2, 2013Date of Patent: September 9, 2014Assignee: Unbound Networks, Inc.Inventors: Damon Finney, Ashok Mathur
-
Patent number: 8832413Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.Type: GrantFiled: May 29, 2013Date of Patent: September 9, 2014Assignee: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase