Method and apparatus for scalable and super-scalable information processing using binary gate circuits structured by code-selected pass transistors
A processing space comprises an array of transistors empowered by forming connections through circuit pass transistors to power and data input/output means and connections therebetween through signal pass transistors. By structuring the needed circuits at the site(s) of the data the von Neumann bottleneck is eliminated, which increases the computing power of the apparatus substantially, thus to enable non-stop Information Processing on steady streams of data and code, with no repetitive instruction and data transfers required. That code will identify the physical locations of every transistor in the processing space, and will enable only the pass transistors therein needed to structure the circuits of any arithmetical/logical algorithm in a processing space of any size, speed, and level of computer power. By joining one processing space to another the apparatus also exhibits super-scalability.
This Application is a Divisional Application of U.S. patent application Ser. No. 11/542,773, filed Oct. 2, 2006, which application is relied upon for priority and by this reference is deemed incorporated herein as though fully set forth herein. Applicant is the sole inventor of the invention that is the subject of the above-cited patent application and of the present invention, and has resided in Lincoln City, Oreg., USA, throughout the creation and filing of the present patent application and the above-cited patent application.
RESERVATION OF COPYRIGHTThis patent document contains text and images that are subject to copyright protection. The copyright owner, who is the present inventor and author of this patent application, has no objection to facsimile, electronic, or other means by which anyone might copy the patent document herein or the patent, or parts thereof, as these appear in the U.S. Patent and Trademark Office files or records, or to copying in accordance with any contractual agreements executed by that owner, but otherwise reserves all domestic and foreign copyright rights whatsoever, all of which rights are fully reserved.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENTNot applicable
REFERENCE TO A “SEQUENCE LISTING”Not applicable
BACKGROUND OF THE INVENTIONIt has often been the case in both science and engineering that solutions to long-known problems are found at the most primitive level of the subject matter. Such was the case here. This patent application relates to “Information Processing” (IP) in general, and specifically to methods and apparatus that carry out that IP by the use of procedures that eliminate the “von Neumann Bottleneck” (vNB). The circuits required for each step of an algorithm are electronically structured within a “Processing Space” (PS) in advance of the entry or formation of the data that are to be operated upon, and after such usage those circuits are de-structured for other uses of the elements involved. As a result, the vNB is removed, and a continuous, uninterrupted flow of IP is brought about.
A principle characteristic of the apparatus is its fine-grained nature, wherein the operational components of that PS (such as single operational transistors) are to be interconnected through Pass Transistors (PTs) both within the bounds of a particular PS and along the peripheries thereof when two or more PSs are joined together. The former process exhibits scalability, while super-scalability is brought out through the latter process.
The above-cited U.S. patent application Ser. No. 11/542,773, filed Oct. 2, 2006, and fully incorporated herein by this reference, traces out a short history of the development of computers, and further contains the results of searching both the patent and the technical literature, from which no document or other evidence of the basic “IL” procedure set out herein having been practiced, or even suggested or contemplated, could be found.
The course of computer development had been set until now by Babbage in his adoption of the only procedure that was available to him with a mechanical computer. In seeking to convert such a single-function kind of device into a general purpose computer, it was natural enough simply to expand upon that Babbage concept, i.e., to provide a number of different calculating capabilities within the device, and then add instructions by which the user could select which functions were to be used in each step. Even had just a single instruction been needed at the outset, that would select just some one process to be carried out, with that development path having been started, the need for both data and instruction transfers was initiated. A part of the operations carried out by the device would then no longer be devoted entirely to the making of binary logic decisions (or decimal calculations, if that were the case), but instead to the task of transmitting data and (possibly) instructions back and forth between the operating circuits and memory.
With larger and more complex algorithms later to be developed, there would be particular operational sequences that would be the same for any number of different algorithms, and in any event it would not have been possible to provide a single complex circuit that would carry out those more complex operations from beginning to end, so the desired circuit sequences came to be broken up into smaller segments. That ultimately led to the operational functions and instruction sets of microprocessors. That economy in terms of hardware needs, however, necessitated the production of intermediate results that would then have to be saved in memory and brought back later for use when required. The need to transfer data and instructions was then multiplied by the number of different segments into which what would by that time have come to be called a program would now be devoted to matching instruction with the data, thus to increase even further the amount of computer time devoted to operations other than making arithmetic/logical decisions. The limits to which that process can be expanded are now coming to be recognized.
To begin this development, in 1822 Charles Babbage had conceived his “Difference Engine,” and then in 1834 his “Analytical Engine,” in which “ . . . numbers would be brought from the store to the arithmetic mill for processing, and the results of the computation would be returned to the store.” M. Campbell-Kelly and W. Aspray, Computer: A History of the Information Machine. New York: Basic Books, 1996, p. 55. In the Difference Engine, those data had to be entered manually, but then with the cranking of a wheel the calculations would proceed. One of Babbage's goals, an automatic calculator, had thus, to an extent, been achieved.
A “limited” general purpose computer was then achieved in the Analytical Engine (on paper only since unfortunately, Babbage had not ever got one fully built), for which the basic process thereof is seen in the “A” entry of
-
- It is a startling fact that the logical and physical separation of the Store and Mill (memory and central processor) is a fundamental feature of the modern electronic digital computer. * * * This layout, which became known as ‘von Neumann architecture’, has dominated computer design to the present day, and is incorporated in just about every computer around. A feature of this article is the separation of the central processor from the memory—a feature explicitly used by Babbage over a century earlier.
Then in 1950, Alan M. Turing gave an example of an instruction as “add the number stored in position 6809 to that in 4302 and put the result back into the latter storage position” A. M. Turing, “Computing Machinery and Intelligence,” MIND, Vol. 59 (October, 1950), pp. 433-460. In preparing his PhD thesis, the present inventor followed the “Op Code-Address A-Address B-Address C” method described by M. W. Wrubel in A Primer of Programming for Digital Computers (McGraw-Hill Book Co., Inc., New York, 1959), pp. 22-23, using the famous “IBM cards” by which the Princeton University IBM 650 executed programs. The procedure described by Turing and that set out by Wrubel differed only in that the destination address referred to by Turing was the same as that to which the result of the procedure was stored, while for Wrubel (and the present author), using that same procedure the “C” address was different. The basic processes of this “von Neumann computer” are shown in the “B” entry of
In that “A” entry in
The oppositely-directed circular arrows between the “Data In” and “Store” boxes in the “Babbage Paradigm” and between the “Data In” and “Memory” boxes of the “von Neumann Paradigm” both constitute that vNb. If one were to construct a super-computer using 10,000 or more microprocessors in which that same architecture is exhibited, such as in a “Massively Parallel Processing” (MPP) design, G. J. Lipovski and M. Malek, Parallel Computing: Theory and Comparisons (John Wiley & Sons, New York, 1987), p. 10, one would then have installed 10,000 or more vNbs, and the time required for all those data and instruction transfers to be carried out would then be wasted 10,000 or more times over. (There would have been a “traffic jam,” of course, but this was ultimately at least ameliorated by providing different busses for the data and for instructions. Even so, one problem was still to be faced, in which different programs or parts of a single program would want access to the same memory address at the same time.)
Developments along that electronic Babbage path to the current processes of entry B in
All that was followed by the basic transistor at IBM in 1947, the stored program in Eckert and Mauchly's 1951 UNIVAC and ultimately putting the data and the program in the same memory with the 1952 EDVAC (Ceruzzi, Ibid.), also bit-parallel arithmetic in the EDVAC, Raúl Rojas and Ulf Hashagen, Eds., The First Computers: History and Architectures (The MIT Press, Cambridge, Mass., 2002), p. 7)), and the use of hardware floating point arithmetic in the IBM 704 in 1955. The vNB was of course carried through in all of these developments.
Then there was the first fully transistor-based computer in 1959, MOSFET transistors in the 1960s, cache memory in 1961, ICs in 1965, active human-computer interaction in the mid-1960s (Ceruzzi, supra, p. 14), the use of semiconductor memory chips in the SOLOMON (ILLIAC IV) computer in 1966, the bit slice or orthogonal architecture in 1972, and LSI for the logic circuits of the CPU by Amdahl in 1975.
The pipelined CRAY-1 with vector registers came in 1976 (R. W. Hockney and C. R. Jesshope, Parallel Computers 2: Architecture, Programming and Algorithms (Adam Hilger, Bristol, England, 1988), pp. 18-19), and finally there were the modular microprocessor-based computers with the Cm* computer of Carnegie-Mellon in 1977 (Id., pp. 35-36), the single chip microprocessor in the early 2000s, and VLSI (16 gates/chip) with the AMT “Distributed Array Processor” DAP 500, in which the memory was placed on the same chip as the logic in 2006, all of which had followed the Babbage path, from which no departure has yet been found. As noted above, the pathway over which those data, both initial and intermediate, and the instructions that selected which of the functions built into the microprocessor would be used at any particular time, came to be known as the “von Neumann bottleneck” (vNB). (von Neumann is not to be blamed for that circuitry, however, since as has clearly been shown innumerable times, it fell to him to be the first to show and describe fully the stored program computer, but not to have initiated the use of a separate data store and the functional circuitry interconnected by busses.)
The procedure described herein, dubbed “Instant Logic” (IL), is represented by entry “C” of
Now to consider current practice, there are two major efforts being taken to gain greater speed of operation, which center firstly on developing smaller transistors with shorter connections between them and thus quicker response times from both aspects, thus to provide more computing power per cm2 of real estate. Another effort is parallel processing, which stands out because in this effort to develop paralleled methodologies as a means of eliminating the sequential processes of the “von Neumann computer,” the principal feature of that computer, what has come to be called the “von Neumann Bottleneck” (vNB), seems to have been left mostly intact. Indeed, if by “Massively Parallel Processing” (MPP) is meant having large numbers of microprocessors all lined up in parallel, and what is then left to do is find a way to have all those microprocessors operate cooperatively, the situation would seem to have been made worse. The use of multiple microprocessors all acting in parallel with one another would in this model seem to have exacerbated that vNB problem rather than resolved it. The gathering together of 100 or so microprocessors entails the introduction of 100 or so vNBs, but the wiring of a network by which those microprocessors are caused to work cooperatively does not add any “Computing Power” (CP), so with the added vNBs and “inactive” network lines the fraction of the effort in the circuitry that is devoted to alphanumeric calculations will decrease.
Applicant's thought then turned to the pass transistor, from which a way in which that vNB might be eliminated came to be suggested. If those pass transistors could be used to structure temporary logic gates using standard operational transistors, with those gates to carry out the desired IP and then be de-structured upon completing whatever arithmetical/logical task had been imposed upon them, then instead of transferring data and instructions to the circuitry that would operate on those data, the circuitry would be transferred to—or rather structured at the locations of—the data. Such encoding—de-coding would run in parallel with but ahead of the data, in a continuous stream, even as the data was passing through the gates that had so been structured and then de-structured, ready for the next IP task to come along. The data itself would flow in the same kind of constant stream as did the code by which the circuits were structured and de-structured, and on the face of it, that process would seem to constitute an enormous leap ahead in the quest for faster and much more voluminous information processing, with any number of algorithms coursing along at the same time. IL is then the result of those musings, made up of the parent patent application hereto along with the present one, in an effort to put those musings into a system that would encompass the concepts derived in those musings.
Considering the array of
Turning back now to scalability explicitly, one difficulty with evaluating the presence or not of that feature rests in how to decide what was to be increased in size to determine whether or not the CP had increased accordingly. If the network had to do only with bringing about cooperation between separate processing devices, there is already an inherent problem. In bringing two elements together, that network has only to cause one element to cooperate with another. If then a third element is added, both of those first elements would then have to cooperate with two other elements, and the network would have to be expanded accordingly. Then, with a fourth element added, the network must see to the cooperation of each element with three other elements, and so on—the more the size is increased the more the network must grow, and the less the fraction of the apparatus devoted solely to CP could have grown along with the size of the IP part of the apparatus, so it must be shown exactly what components make up that part.
Consequently, in designing an IL system that would be scalable, there are three features that such a system would need to have: (1) each element would have to be operationally separate from and independent of every other element; (2) each element would need to have its own power and control; and (3) each element would have to interact cooperatively with the other elements as a matter of course, with no more hardware being needed to bring that about. (That there would also be no software goes without saying: the ILA would not recognize a program of instructions, and would not “know how” to deal with one.) Surprisingly enough, such is the case with the IL element of
In IL, the feature of scalability derives from that fine-grained structure. If the smallest possible operational element that can carry out an IP function is made to be the basic foundation of the apparatus, and is provided with its own power and control, the one aspect of the apparatus that still remains to be shown in order to achieve scalability is that those operational elements would have to operate cooperatively as a matter of course, with there being no need for any additional circuitry, such as a “network” of a parallel processing apparatus, to bring about that cooperation. Individual operational elements of the type shown in
As to “acting cooperatively,” distinction must be made between what is required to prepare an operational element so as to be put to use, and the actions of some group of operational elements that are acting cooperatively when all of those preparations have been completed, and the elements are in fact being put to use. To take a simple example, in an AND or OR gate scenario, the operational elements involved, which are connected in series and in parallel, respectively, must only act together in response to incoming data so as to execute the respective operations through their cooperative behavior. They do so because the operator, by way of the code, has directed them to do so, so that the respective series and parallel circuits are formed, and that is all that is needed to have that cooperative action take place. If there had been no such code entered, those elements would have had no interaction whatever, since there would be no connection between them. (As an example, each of two such elements, even if adjacent, could be separately encoded as inverters, and here there would be no cooperative action, since there had been no interconnection of those elements that would require such cooperation. Such an operation would yield two outputs, neither of which could have any connection or relationship with either an AND gate output or an OR gate output, or indeed any relationship whatever with what the other element was doing.)
Another “hidden” advantage of IL should also be noted. In current IP circuitry in general, while the processing elements thereof are not vacuum tubes burning up tremendous amounts of energy in their cathode filaments, nevertheless the transistors in the microprocessors will all be on and operating, whether there are programs then being executed or not, and even that will constitute a significant use of power. In IL, no matter how large an array of PSICs there may be, although there will be auxiliary apparatus such monitors, printers, code routers, etc., quietly humming, no electrical power will be getting used up in the PSIC itself unless an algorithm is actually being executed, which in itself should amount to a significant conservation of energy compared even to the quiescent energy consumption of current IP apparatus.
The feature of scalability arises from the fact that the operational elements of the system are individually controlled, without reference to whatever control may be applied to any other operational element, whether distant or “right next door,” and if interpretable as data, positional y significant. When one operational element has been prepared for use, how the next operational element is to be prepared (if indeed it is) is solely up to the operator, e.g., whether to connect a second, adjacent operational element in series with the first in order to structure an AND gate, in parallel so as to structure an OR gate, or etc. The two operational elements will then have become interdependent and would indeed act cooperatively, in either of those circuits or in any others that might be contemplated, but that interdependence would have been imposed by the operator and is a feature of the circuit selected for use, not a feature of the elements themselves.
What may not be readily apparent, and thus remains to be explained, is what the ability to structure binary logic circuits by interconnecting the operational elements thereof has to do with scalability. The answer is that nowhere is it hinted that more elements could not be added, and if the means of so doing (i.e., by adding more processing space through additional interconnection possibilities) uses the very aspect of the ILA that defines its CP, scalability arises as a matter of definition. More exactly, the CP of a PS is measured by the number of inter-element connections that it has available, and the structuring and operation of a binary logic circuit using those inter-element connection possibilities merely corroborates that availability, and likewise corroborates the truth of a statement that such and such PS (and hence ILA) has so much CP.
That is the definition of being scalable: more operational elements can be added at will (of course, if the size of the substrate is such as to accommodate more operational elements), and as more elements would have been added, in that measure alone the CP would have increased in the same fashion and at the same rate. Put another way, the question is whether, in order to increase the CP of an apparatus, there is anything more that one must do than add more of the elements that carry out the IP, in which of course “adding an element” includes adding the circuitry by which an LN 12 could be empowered to operate and then encoded to do so in such and such a fashion. If nothing more is needed, and one is free to add those new elements at will, the apparatus is scalable. In an ILA, the size of the “Processing Space Integrated Circuit” (PSIC), together with the power and control circuitry individually associated with each element thereof, determines the CP of the resultant ILA, and hence is scalable. (In the context of IL, there is no such thing as being “almost scalable,” or that “this device is more scalable than that device,” etc.: a device is either scalable or it is not, and there is no half way or any other fraction. But as will be shown below, scalability can in fact be exceeded, in super-scalability.)
That circumstance may be loosely referred to as constituting a “limited” scalability (which does not contradict the previous parenthetical statement—“scalable” means that the CP is exactly linear with the size of the processing space, but of course one can “nm out of” processing space), since an integrated circuit can only accommodate so many operational elements, but it will be shown below that such integrated circuits can be joined together, also at will (and while so doing bringing out the presence of super-scalability), so in theory there is no limit to the size of a completed apparatus, other than that of cost, power, and space in which to hold the final product.
In order to establish the existence of both scalability and super-scalability, it is necessary to provide precise definitions of (1) “Computing Power” (CP) and (2) “Size.” There are no prototypes of an ILA, so there is no way in which the FPS of any prototype could be measured. However, there is an unambiguous measure of what that throughput must be for any theoretical ILA of some given size. This would not, of course, be expressed in FPS, but rather in the CP potential of a given ILA. The size of an ILA can be expressed in terms of how many PEs it has, where a PE is an operational transistor together with the pass transistors used to structure the desired binary logic gates and the power and control means by which the former devices are made to carry out their IP functions, while the CP of that same ILA can be expressed in terms of how many inter-LN 12 connections it has. As previously noted, the “CP” of a device is not an inherent property, as in having some particular mass, but must be treated as depending upon how it is measured. The basis upon which the above assertions concerning the CP, inter-LN 12 connections, and ILA size were made will be described in detail in the BEST MODE FOR CARRYING OUT THE INVENTION.
SUMMARY OF THE INVENTIONThe present application is concerned with the scalability and super-scalability aspects of IL. In order then to apply these properties to the “IL Apparatus” (WA) it will first be necessary to define those terms exactly, starting with the meaning of the term “computing power.” A common measure of that term is the number of floating point operations per second (FPS) that an information processing apparatus, or computer, can carry out. A general definition of being scalable is that the computing power of an apparatus having that property will double if the size of that apparatus is doubled. That definition can be expressed in more common mathematical terms by the assertion that the computing power varies linearly with the “size” of the apparatus or computer, which then leaves the task of specifying how that “size” is measured.
On the other hand, the crux of the present invention is simply that scalability derives from the fact that the basic “Processing Elements” (PEs) that make up the ILA as a whole (1) contain within themselves the entirety of the total “Computing Power” (CP) that may be present in the ILA at any given time, and (2) operate in complete independence of one another. The operating portion of the WA (see
Then, while the scalable aspect of IL derives from the independent nature of the basic PE, the super-scalability of IL derives from the architecture of those PEs. To increase the size of an IL apparatus is really to increase the number of those PEs, but as it turns out, by a simple rule of geometry that architecture will increase the number of inter-operational element connections at a faster rate than that by which that size is increased.
It was noticed that the basic processing element that had been conceived for the purpose of eliminating the vNB turned out to be a PE of which the operational part is shown in
Then in developing from the circuit of
It can be seen in
In 2004, in “A Perspective on the Future of Massively Parallel Computing: Fine-Grain vs. Course-Grain Parallel Models,” Proc. CF '04, Apr. 14-16, 2004, P. T. Tosic proposed that instead of having an “evolutionary” (pp. 488-489) advancement in the IP art along the same general principles as he then saw them, what was needed was a “revolutionary” (p. 489) advancement into “new parallel computing frontiers.” The issue was of course the “von Neumann bottleneck” (vNb), J. Backus, “Can Programming be Liberated from the von Neumann Style? A Functional Style and its Algebra of Programs,” Comm. ACM, pp. 613-641 at 615, August, 1978, that has been a concern in the IP art for some 180-odd years (as will be shown below).
This author suggests that the present text provides that “new frontier,” and sets out a new approach to IP that starts out in the electronics art at the most fundamental level possible, and is evidently the first discussion to elucidate both the origin of that bottleneck and then the means by which to eliminate it. The common sense practicality of that solution then brings out advantages beyond that simple elimination of the vNb, that turn out to be inherent in the resultant architecture, and that somewhat astonishingly could lead also to an Information Processing Apparatus (IPA) of essentially unlimited power and scope. That device is the “Instant Logic Apparatus” (ILA).
To carry out any kind of IP, the first essential element, that is designated herein as an “Operational Joinder” (OJ), is to bring together the data to be operated upon and the apparatus that will carry out those operations. To effect that result there are only two possible ways: by sending the operands (the data) into specific locations within that apparatus where the circuitry that could operate on those data is located, as done by Babbage and everyone else ever since, or by providing the “processors” (operators) that will operate on those data at the location(s) of those data. That first historical OJ procedure is of course the origin of the vNb—sending data and instructions to the required circuitry and then back to memory, etc., and the computer industry has been burdened with that procedure ever since.
To eliminate the vNb, IL simply reverses the Babbage/von Neumann paradigm and provides the operators (or, more exactly, the requisite circuitry) at the site(s) of the data. The places within a “Processing Space” (PS 10) at which the WA operator wishes to have the IP take place is then identified as those at which the data will appear, wherever that may be, and will then identify those same places for the structuring of the required circuitry. Operationally, the code for structuring that circuitry will be entered just before the data are caused to enter PS 10. (A table of the components and the number codes attached thereto is provided at the end of this text, just before the claims.)
There is thus no need to send the results of that first operation anywhere, since the circuitry needed for the next step in the algorithm will already have been structured just before the creation of those resultant data, and would then be facing directly onto the circuitry that would participate in the next step of the algorithm. There will then be a continuous flow of code that will structure new circuitry on each step, followed by new data, thus to carry out the IP in a continuous, uninterrupted manner. By way of that circuit structuring, the PS 10 will provide all of the “data-relevant” circuitry needed, by which is meant the circuitry that has data bits passing through the terminals thereof. By data “passing through” is of course meant the input data and then whatever those data may be changed into, joined with, and so on, through the full course of executing the algorithm. Those circuits may be made to wend their way through the PS 10 in directions that the user can select arbitrarily, proceeding in circles or across and down the PS 10 like a typed paper, except that upon reaching a side of the PS 10 the structuring may drop down a row and then be carried back in the reverse direction, thus to “zig-zag” over the PS 10.
Regions of the PS 10 that had just been used in circuits can be put to other uses at once, however the algorithm may require, thus insofar as possible to keep as much of the area of PS 10 at work structuring new circuits as possible, perhaps carrying out some number of similar, parallel operations on some large array of data requiring the same treatment. In any event, the complete execution of the required algorithms in the least amount of time will rest on the ingenuity of the user, but at the end will yield the results needed, after a length of time that should be considerably less than would have been required using current von Neumann computers, even if those von Neumann computers had been using “Massively Parallel Processing”: every time one adds another microprocessor to a system one has also added another von Neumann bottleneck.
(It should be evident that the critical remarks herein concerning “Massively Parallel Processing” are directed solely towards apparatus that use CPUs and microprocessors, and exhibit the vNb; an “Instant Logic Apparatus” (WA) is nothing if it is not “Massively Parallel”—as mentioned several times herein, it is intended to have the entire processing space of the ILA “packed solid” with as many algorithms in the process of being executed in parallel as space allows, all operating simultaneously, which may not quite fit the definition of “parallel” that applies to CPU-based apparatus, but it does fit the common meaning of the term.)
(Given the objective of having as much of the PS 10 “real estate” in operation at the same time as possible, and considering that such goal may require as wide a variety of inter-LN 12 connections as possible, it may be wondered why, in
Besides the code itself, what is further needed for the operation of the ILA is the circuitry by which that code causes the desired PTs to be enabled, starting with the process of locating the particular LN 12 to the PTs of which the code that follows will be applied. All of that circuitry has been shown and described in detail in the parent application, which has been incorporated into the present application by reference as expressed earlier, and thus does not need repetition here. The circuitry to be shown hereafter will entirely be circuits that have been structured by the application of the code. The following description of the code has also been provided in that parent application, but the demonstration of scalability and super-scalability rests on specific applications of that code, and for convenience, the code is provided here so as to be close at hand. (The ID numbers used may vary somewhat from application to application, but that should present no problem so long as the code being used relates to the application at hand.)
To bring all that about will require mastery of the code first, and then the circuitry to be structured using that code. The former task requires first the code identification of the components of which the circuit of
(As described in the parent application, there can also be an embodiment that has a second physical level of PS 10 aligned above the first level, and reached through a Post controlled by a “Post Pass Transistor” (PPT) that connects like terminals in the two levels using at least one terminal, but that embodiment is not further addressed herein.)
For purposes of clarity in the drawings, the imposition of a code that will enable the desired PTs is shown and described in the figures simply as “1” bits, but the actual process in fact is to apply an enabling voltage sufficient to render conductive each relevant PT of each LN 12 in a circuit that was to participate in the particular step then at hand. That voltage is removed when the circuit has completed its operation on the data for that step, i.e., each of some number of the data bits that had come in to the one or more LNs 12 that were to participate in the particular step of the algorithm, either as initial input or as a result of the operation of a preceding LN 12, had been fully processed, with some resultant one or more bits then passing on to the next LN(s) 12.
If the same function was to be carried out repetitively, as in adding a column of numbers, to accommodate the continuing arrival of new data bits the PTs so enabled could simply be left in that conductive state until the complete data stream had been operated upon. That process would be carried out with respect to whatever amount of circuitry would be common to whatever sequence of numbers was to enter as input, and likewise at any other location within the algorithm at which repetitive processes were to be carried out, or as to any other repetitive function than addition.
One reason for using that fixed voltage source is to ensure that with issues of fanout sometimes arising, the voltage would suffice to enable every PT as needed for a circuit. A second reason is to avoid problems of logic racing. As will be seen below, there is one place in the XOR gate where two signal sources serve as inputs to a 2-bit output AND gate and hence must arrive at the same time, but the pathways followed in reaching that AND gate differ between the two branches. To use a “1 bit” (i.e., that applied voltage) of too short duration leaves the possibility that the “1” bit (if any) on one side would have begun to decay by the time that the other “1” bit (if any) had been received. In this and other similar circumstances, some “fine tuning” of the process might be required.
A Toggle flip-flop (not shown) could be used to turn on the enabling voltage when the code therefor had been received, then turning that voltage off upon completion of the operation, using a second entry of that same code. The times of those two events would be adjusted so that the enabling voltage would be present on the particular SPTs 14 at the times that the signals reached both inputs to that output AND gate, and in other such similar circumstances (which could be much more complex). It would only be necessary that the signal bits were of sufficient duration to act at the same time, but if necessary the timing and duration of those could also be controlled either by toggle switches or by the clock. It is clear that every complex algorithm would need to be controlled by its own individual clock, which is provided for each LN 12 in any event.
It should be stressed, however, that the manner in which these enabling voltages are handled has nothing to do with the speed of operation in carrying out IP, but only with maximizing the amount of available PS 10 space, and thus to affect the throughput only indirectly, i.e., more space taken up by LNs 12 with enabled SPTs 14 that were not yet in use would mean fewer algorithms in operation. The circuitry for an algorithm could have been structured in advance of the operation, and simply have been left available, so what is at stake is simply the question of how little PS 10 space could be used for the algorithm, thereby to “free up” as much space as possible for other algorithms.
The questions are: (1) how soon before any data arrive must the codes that would structure the circuitry that would operate on those data be entered; and (2) how soon after the data have arrived can the enabling bits that had structured the required circuitry be removed. The purpose in resolving these issues is again to minimize the amount of circuitry needed to be present as to each cycle of the operation, as well as the length of time during which that circuitry must remain in place. In other words, no more LNs 12 are to be in a “structured state” as part of a circuit at any one time than is necessary, in order that those LNs 12 that need not be structured at each particular time would remain available for use in the circuits of other algorithms. These adjustments would constitute the “fine tuning” of the code, and could be a rather time consuming operation for the user. As to the issue of whether to use a single clock and encode all of the algorithms at the same time or provide a separate clock for each algorithm and execute all such algorithms independently of one another, it is clear that independent operation would be highly preferable, indeed almost essential, in order to permit the timing adjustments and other “fine tuning” exercises noted above to be carried out.
It is presumed that the user, having identified an IP task needing treatment, would have found or developed an algorithm in algebraic form that would solve that problem, and would then, knowing the circuit equivalent of each of the algebraic terms, would have converted that algebraic formula into a “circuit equivalent,” i.e., would have laid out a series of circuits that expressed the sense of all of the terms of the algorithm, and thus would have set out an extensive circuit by which the algorithm could be executed. (The circuit of
What is then required is simply to take a blank PS 10 form (on paper or on screen), i.e.,
In Table II below, after that first “SPT 14 No.” column (1), the next three columns show (Col. 2) the identity of the terminal on the originating LN 12 (OLN) to which the proximal end of the SPT 14 at the OLN is connected; (Col. 3) the direction from that OLN in which the SPT 14 extends (thereby to identify the receiving LN 12 (RLN)); and then (Col. 4) the terminal of the RLN to which the distal end of the SPT 14 connects. The last two columns show the two codes that could be used (of course with the connections from memory to the PTs on each LN 12 in PS 10 being set up accordingly), with the Vector Code (Col. 5) being that which rests on the identification of the terminals and the direction as just stated, and then the Binary Code (Col. 6) simply uses the binary form of the identification numbers of the PTs as shown in
In greater detail now, the leftmost column in Table II shows the assigned number of each of the SPTs 14 as shown in
In the “Vector” column 5 of Table II, the first two digits express the code for the originating terminal on the OLN (DR=01; GA=10; SO=11) for the proximal end of the SPT 14 line; the next code is two-digit and specifies the direction of the SPT from the OLN to the RLN, rightward=01, upward=10, inward=11, for a 3-D embodiment, but the 1-bit code shown in parentheses in Col. 5 should be used for the 2-D embodiment employed herein, for which rightward=0, upward=1; and the last two digits in Column 5 identify the destination terminal of the RLN to which the distal end of the SPT 14 connects, using the same code as that which was used for the proximal end of the SPT 14, i.e., DR=01; GA=10; SO=11.
The binary code method of encoding the SPTs 14 is simply to use the binary equivalent of the assigned numbers of the SPTs 14 that are shown in the first column. That would also require only five bits, but is not used even so, since the use of that binary method would require the encoder (user) to have a digital-to-binary conversion table at hand (or else develop those codes mentally, which process would be prone to error), while the vector method develops the code directly from looking at the circuit that the encoder is using, if the foresight to write out that circuit had been exercised.
The resultant full code using the 2-D Vector method would then appear as
-
- iiiiiiiiiicccds1s1s2s2s3s3
for enabling a single SPT 14 on a single LN 12, wherein the 10-bit “iiiiiiiii” code expresses the INj value of the LN 12, the “ccc” code expresses the CPT code, “d” is the direction code, and for a single SPT 14 the code is “s1s1s2s2s3s3”. If two SPTs 14 were to be enabled on that single LN 12, the full code would appear as - iiiiiiiiiicccds1s1s2s2s3s3s1s1s2s2s3s3.
Of course, the length of the iii . . . “is set to accommodate whatever the size of the PS 12 being used might be.
- iiiiiiiiiicccds1s1s2s2s3s3
Before setting out to encode any PTs, however, it is necessary first to identify the LNs 12 to be used in structuring the circuit. Whatever may be the number of LNs 12 that are involved in a particular step of the algorithm, the code for all of the PTs needed to structure the corresponding circuits involved in that step are enabled at the same time, in a single “Code Line” (CL). As can be seen in the above code, for each LN 12 the INj code is placed at the front of the code for that LN 12, and then all the CPT, direction, and SPT 14 codes, in that order. After the equivalent of a carriage return (which in this case will put a double space between the codes for separate LNs 12), the INj code for the next LN 12 involved in that same step of the algorithm is brought in, again followed by the relevant CPT, direction, and SPT 14 codes, with that process then continuing until the INj, CPT, direction, and SPT 14 codes for all of the LNs 12 that participate in the circuitry for that particular step of the algorithm have been assembled into the CL for that step. The CL is then dumped as a whole into memory (specifically, a “Code Cache,” which is preferably separate from the main memory). Preferably, a “test run” on the CL would be run before the CL was saved in that Code Cache for future use.
As noted above, where in the PS 10 the circuitry will be structured for an algorithm can be anywhere that would not already be in use at the same time, but even so it is necessary to establish the “Index Number” (INj) for entry (as the first part of a CL) when the encoding is to take place, with those INj being the binary versions of the “Location Indicators” (LIi's) assigned to each LN 12. In the gross 5×5 array of LNs 12 shown in
Absolute LIi value determinations. Generally speaking, it would be easy enough to establish the x, y, and z coordinates of an LN 12 that was even deep down within the PS 10 somewhere, but by itself that would not disclose the LIi value. Even so, an LIi value can easily be found from those coordinates. Since the x, y, and z coordinates can be found by inspection, the LIi can then be found from the following equations, wherein the LIi values are based on absolute terms, i.e., solely on the specific coordinate values of the LN 12. (If the following equations are solved by a computer then the terms therein will necessarily be in binary form already, so to get an actual LIi value as literally shown in the equations would require conversion back from the binary form, but since it is the binary form INj that is ultimately being sought in any event, one can simply ignore that added process and simply take that INj value as it will naturally emerge from the calculation.) The following equations will of course be very similar to those used in the “Relative LIi value determinations further below:
For a 1-D array,
LIi=LI(x)=x, (1)
where x is the coordinate of the LN 12 for which the LIi value is being sought.
For a 2-D array,
LIi=LI(x,y)=XM(y−1)+x, (2)
where XM is the length of the x axis and x and y are the coordinates of the LN 12 for which the LIi value is being sought. The row in which the reference LN 12 is located has the value y=1, so a point that was in that same row would not involve the addition or subtraction of the XM row length, but a point in one row downward would add the full length of that row, and then any further displacement of the “target” LN 12 would be added, as indeed it should to get the correct LIi (and thus the INj) value. That is, the originating row is row 1, one row down is row 2, so XM (y−1)=XM(2−1)=XM, and the result would become LIi=XM+x. More generally, LIi=±kXM±x, where k is the number of rows down (or up), where down is positive and up is negative.
For a 3-D array,
LIi(x,y,z)=±XM(YM(z−1)+y−1)+x, (3)
where XM is again the length of the x axis, YM is the length of the y axis, and x, y, and z are the coordinates of LN 12 for which the value is being sought. In this case, a z value of 2 would add in the product XMYM, which would be the LN 12 content of one full plane of LNs 12. The “±” term is used as before, if either XM or YM is leftward or outward, respectively.
(Rather than use the equations, in using a computer it might be worthwhile to have prepared tables in which the LIi values were filled in for all of the x, y, and z coordinates of a PS 10 of each size that your institution uses. The INj values could also be shown, although their length might preclude having a readable graph, especially if the PS 10 were 3-D, and very large.)
Relative LIi Value Determinations: Those LIi values can also be found from the positions of the LNs 12 in question relative to an LN 12 for which the LIi value is known. This procedure is important in the development of “Code Modules” (CMs) that will provide the code necessary for the structuring of as many different circuits of a particular type as may be required. (The same applies also within functions such as addition, where there will be partial sums scattered all over the PS 10, thus to require the insertion of a good many repetitions of a half-adder and the linkages between them.) Such a CM will have been developed using some particular set of input LIi values in an exemplar circuit, but when using that CM, as a rule the LN 12 locations at which the circuit needed to be structured would be different. Using the formulae given below, the code for the desired circuit can be obtained by having prepared a set of those formulae for each circuit type to be structured, wherein the constants XM and YM in the formulae would have been taken from the dimensions of the PS 10 then in use, so the full code for the desired circuit (for each step of the algorithm) can be determined simply by identifying the circuit type and entering a LI1 reference number.
The relevant parameters (defined below) are picked up out of the CM data used in the formulae below and taken in hand for each LN 12 of the particular circuit, as specified in the CM. By such means, various versions of the circuit could also be selected by way of additional indicia beyond just the name, e.g., in selecting whether the circuit was to extend in the horizontal or vertical direction, whether continuously or should turn off at some point, and as a result there would be different directions in which to proceed within the circuit itself, thus to take account of whatever space constrictions might arise from there being other circuits present for other algorithms at the time that the circuit that was then to be structured was needed.
(Needless to say, the circuit that had been “drawn out” for use in developing the code therefor would preferably have an electronic version, along with electronic versions of whatever circuits for other algorithms were then present (or were about to be), so that the circuitry then being built up could be mapped around those circuits. Those other circuits for other algorithms would of course need to be changing with each click of the clock, since each click would signal the need to structure a new set of circuits for the next step of the algorithm.)
It should be noted that where there appears to be a “collision,” i.e., where the LN 12 that one would like to use is already in use at the time in question, that usage may disappear on the next cycle. Each “mapping” of the LNs 12 to be used should then be tied to a particular time frame, since a shifting in time may make a desired circuit structuring of the algorithm then being mapped perfectly possible, where by “mapping” is meant the identification of all of the LNs pertinent to a cycle of the particular algorithm or part thereof. In other words, collisions may be avoided by a shift by either the algorithm then being encoded or the algorithm already encoded, in both space and time. A delay in the completion of some particular cycle need not delay the ultimate output time unless the branch of the calculation sought to be delayed was already the slowest branch.
Using one CM rather than another (for the different versions of the circuitry to be structured) would result in different values for the relative locations of the LNs 12 that a particular CM had defined, but regardless of which CM was used, the relative formulae would remain as follows:
For a 1-D array,
LIi=LI1±ri, (4)
where LI1 is the reference LIi value and ri is the distance along the x axis, in either direction away from the location of the reference LI1, of the LN 12 for which the LIi value is being sought.
For a 2-D array, moving up or down subtracts or adds a number to the LI1 that is equal to some multiple of that row length, depending upon how many rows away from the reference LN 12 that next LN 12 was located, i.e.,
LIi=LI1±ri±kiXM, (5)
where LI1, ri, and xM have the meanings as before and k is the number of rows above or below the row containing the reference LI1 in which the LIi in question is located.
For a 3-D array, for a location that is one or more planes away from that which contains the LI1, for each plane moved away there must also be added or subtracted the area of a plane in the array, i.e., XMYM, thus to yield
LIi=LI1±ri±kiXM±miXMYM, (6)
where LI1, ri, XM and ki have the meanings as before, YM is the length of the y axis, and mi is the number of planes along the z axis by which the LIi in question is removed from the original plane of the reference LI1.
When using a step-wise manner of determining the LIi for the LNs 12 involved in the algorithm, there are two ways in which to proceed: (1) using the same LI1 as the reference throughout all of the steps of the algorithm and for all of the sequential data; or (2) when obtaining a new LI1 value following each one step of the algorithm, and then using that new LI1 value as a new LI1 reference. If the original LI1 were used as the reference throughout, the LI1 values could be anything, and would depend both upon how far away in the circuit the particular LN 12 was located, and how far that circuit had been displaced. The complete Eq. 6 would then have to be used. If using the new LI1 reference, the numbers away from that reference would be smaller and easier to accommodate, but would require a mind wrap to get rid of the original LI1 mind set. Either way would have its error traps, and these would have to be overcome, but could be with just a bit of concentration.
The question then arises: what does all of that have to do with scalability and super-scalability? The long answer is that the above formulae can be used for at least five purposes: (1) to determine the LIi and hence INj values of the remaining LNs 12 in a circuit based on a drawing of that circuit, using the value of an arbitrary one of the LNs 12 of that circuit (typically one of the inputs) as a LI1 reference; (2) to track the course of additional instances of a circuit based on there being some repetitive process, as in the successive half-adders in an ADD process; (3) the development of a library of CM for some set of standard circuits for future reference purposes; (4) tracking of the sequential steps of an algorithm to identify where the next LNs 12 can be put to use; and lastly (5) reconstructing the values of INj when an additional PSIC 18 was or was to be joined to an existing PSIC 18. It is in this last process that super-scalability is found. It is not found in new, unexpected LNs 12, but rather in LNs 12 that were able to make inter-LN 12 connections that had not previously been possible.
It is not the purpose here to describe the details of those first four processes, which are not directly relevant to the present issues as to which claims are made, but only the last. What has been said so far, however, will be needed in order to recognize the way in which super-scalability comes about, so that subject will now be addressed.
In a 2-D environment, any circuit to be structured by IL can be defined by setting out the ri, ki, XM, and m, values that are applicable to each LN 12 within the circuit (and within the PSIC 18), and such a collection of those values for all of the LNs 12 involved in each step of the circuit will constitute a CM. Values can be identified for all of the LNs 12 that are parts of some particular circuit, or for another new, identical circuit that is displaced from an initial circuit. With that information, it is not necessary to have any knowledge of the circuit being structured, but in addition to that information one must only know what was the LI1 value from which those ri, ki, xM, and mi, values were found. It then only becomes necessary to express those values in binary form and carry out those calculations, while assuring that the “iiiiiiiiii” value that results remains associated with the code that defines each SPT 14, i.e., the “cccdssssss . . . ” code as had been associated with the original LI1.
Once all of the “iiiiiiiiii” codes for the circuit LNs 12 have been determined, any movement of the circuit, or more likely, a repetition of the circuit at another location, would have “moved” every LN 12 in the circuit by the same amount, so by using that fact repetitive calculations relative to the IL1 location can be avoided. However, motion being relative, displacing a circuit is equivalent to re-defining the system by which the INj are specified, which is what usually happens when one PSIC 18 is joined to another PSIC 18, as seen in
If initially installing an algorithm, the above-listed values would be entered “by hand,” based on the structure of the circuit as drawn out on paper or preferably on screen, and determining therefrom the ri, ki, and mi values for use along with the LI1 and xM values, the former of those last two being selected by the user (on the basis of which circuit is to be structured and where there is space enough to structure that circuit) and the latter being set by the x dimension of PSIC 18. If using a CM, the XM value would already be contained within that module, and it would only be necessary to enter the LI1, ri, ki, and mi values appropriate to the first LN 12 of the circuit, and then a code sub-routine would have been set up to calculate the location of each of the other LNs 12 of the circuit relative to the position of that first LN 12. The locations of the LNs 12 of a circuit being moved could be determined by applying the applicable one of Eqs. 4-6 to each of those LNs 12 using the same LI1 value, i.e., by determining the different ri, ki, and mi values that each LN 12 would have by virtue of having different positions within the circuit, or preferably by applying a single LI1 value to obtain a reference LI1 value as to one of the LNs 12 in the circuit, and then using those equations again with that LI1 value serving as the reference, along with the structure of the circuit, to obtain the LIi=LI2, LI3, etc., and then from those the INj values for the rest of the LNs 12 within the circuit. (Of course, if the equations were executed in the computer, using the circuitry described below and not by hand, all of the terms would already have been converted to binary, and it is not an LIi that would emerge from a calculation but a binary INj.)
Which of those two sides down
That “+/−1” 38 entry point connects to one input of a 2-bit XNOR gate 40 on each side of the circuit, with the second input to XNOR gate 40 connecting to a “Reference Register” (RR) 42. As it serendipitously happens, one of those RR 42 can hold the ASCII code for a “+” sign and the other the ASCII code for a “−” sign, since these differ only by the “+” code having a “01” in the third and second leftward positions, while the ASCII code for a “−” sign has a “10” at those two positions. (When employing the blank fields of the formula, the user will follow the easy path and simply enter either a “+” or a “−” symbol just as it appears in the formula, given that only the two bit positions indicated are needed to distinguish between the two.) The particular XNOR 40 gate for which the RR 42 code and the code entered at the +/−1 38 entry point are the same will yield a “1” bit, thus to select either the “Addition” (ADD 44) circuit or the “Subtraction” (SUB 46) circuit. This first step serves only to select which mathematical formula will come to be used, i.e., which of the two alternative versions of the first step of Eq. 5 will be used, and nothing else will occur until what turns out to be either a subtrahend or an addend is entered in the third step.
(This procedure is not the most efficient that could be used, but has been designed to be as “user friendly” as possible, which was thought to be that procedure in which the user need only have the above Eq. 5 (in this case of a 2-D PSIC 18) at hand (preferably in an on-screen list, where the correct entry need only be copied and then pasted into an empty, properly labeled blank field, with a copy of the circuit drawing also being displayed), with all of the OTs labeled by those left-to right, top-to-bottom LIi numbers, and then enter in succession the terms of that formula just as expressed therein. Again, this process is intended for use only in the initial installation of an algorithm “by hand” from a circuit drawing, while in using a CM more efficient electronic means would be used.)
In step 3 the ri value, as the next term in the equation, is entered at another blank field, as shown by the term “r1” and the label “48” on both sides of the circuit and labeled by a “3,” which connects to both of the r1 entry points, the respective outputs of which are shown to be LI1+r1 or LI1−r1, depending upon whether “+” or “−” had been entered at location 38 in Step 2. As in the case of the LI1 entry, there is only one blank field, this time labeled “r1,” and only one physical entry will be carried out, but the value entered will appear in both the ADD and SUB branches on the two sides of
It should be noted that the lines that connect to the outputs, if any, of both the ADD 44 and the SUB 46 sides are bidirectional, as shown by the double headed arrows. As a consequence, if the first operation had been an ADD 44 to form the quantity LI1+r1, that quantity will be transferred both to that second ADD 54 and to that second SUB 56. That is necessary because the second operation of an ADD 54/SUB 56 pair could just as well be a SUB 56, and without that bidirectional transfer there would be nothing on that second SUB 56 side from which to subtract that kixM value, and similarly if the first operation were a SUB 56 and the second an ADD 54.
In the next part of
Now unlike the first determination in which the quantity to be added or subtracted was entered immediately after the selection between those two operations was made, in the present case the quantity to be entered must first be calculated. As a consequence, there are actually two next steps, the fifth and sixth steps in the terms of
As the next steps in the
How a user may break up the encoding task so as to know at particular times that “that job” is done is up to that user, but in any event, when a user had located an IN value for one LN 12, the process just described would be repeated for another LN 12 until that task was “done.” The kinds of circuits that would lend themselves to being used as code modules would also be those for which one could develop all of the values therefor and then turn to the rest of the code, so we can assume that such would be the case here, i.e., to make any “breaks” in the job take place upon the completion of the encoding for those LNs 12 that would make up a complete CM.
With the task of encoding the locations with INj values for some particular IL circuit having been completed, it remains only to extract those INj values so that they can be joined up with the rest of the code. That constitutes step 7 in
But before getting into the actual circuit structuring, it seems appropriate to take note of a disadvantage of IL, perhaps obvious from
There are several limitations on how many LNs 12 could be incorporated on a single IC. It may be noted first off that except for removing the rounded portion of a circular wafer there is no reason for slicing the wafer into chips. To form a PSIC 18 the wafer could incorporate a single IC, since the object, after all, is to have as many LNs 12 disposed on a single square chip as possible, and for that purpose the fewer the cuts the better. (The cutting process itself can introduce faults, besides taking up space.) In order to make that number of LNs 12 as large as possible, there is again the question of how small the transistors can be made, but IL introduces another limitation in how small the hardware that makes that top-down connection can be made. In particular, as will be shown below, the actual connection is made by way of a pin, and there is a practical limit on how small such an element can be made and still be subject to dependable and accurate manipulation for proper placement. That is a matter that can only be determined by experiment, no doubt using a micromanipulator (that would also no doubt be used in the actual fabrication of the PSICs 18), so it is not possible at present to give an accurate estimate of how many LNs 12 could be placed on a chip. (One may surmise, however, that it will be this pin issue and not how small the LNs 12 and SPTs 14 can be fabricated that will place the upper limit on that number.)
The approach then taken to this off-chip connection problem was that “top-down” method of making external connections to the various terminals within the PSIC 18. The structure of the chip itself has been described in detail in the parent application, incorporated herein by reference, but some aspects of that procedure bear directly on the issues of scalability and super-scalability, and hence the topics that relate directly to those aspects of IL are set out herein again, with the relationships to scalability and super-scalability being explicitly pointed out.
It would seem that the issue of scalability had already been encompassed, at least in part. If the operation of an LN 12 (i.e., the SPTs 14 associated therewith) depends only upon the code sent thereto, and each LN 12 has a full contingent of its own code arriving thereto, each LN 12 will then operate fully independently of any other LN 12, with all of the power necessary for full IL operation. Of course, the fully independent LNs 12 have to work together if any IP is to be carried out, but that kind of cooperation is a function of the algorithms and the user's choices, and is not anything inherent in the PSIC 18 itself. There is a limit to the size to which a PSIC 18 can be built, however, so while the independent operations of the LNs 12 suffice to establish at least a “limited” scalability, it remains then to eliminate that limit.
A way to do so would be through the joining together of as many “basic” PSICs 18, by which is meant a PSIC 18 of some convenient fixed size that could be replicated without limit, and then be joined up together as far as one wished to go. If by way of such joinders more and more fully operable LNs 12 could be added to an existing PSIC 18 at will, then IL, would again show itself to be fully scalable. The viability of that statement would depend upon the fact that independent encoding circuits exist for every LN 12 in each of those basic PSICs 18, so suffice it to say, as shown in the parent application, that every LN 12 has associated therewith a complete and independent array of encoding circuits, so an unlimited scalability would seem to have been shown.
For actually reaching each of the LNs 12 in an array, in order to avoid having to search through an excessive number of LNs 12 in order to find a particular one, the circuit that carries out that task was described in terms of an array having 32 LNs 12, which was used as a sort of mean-sized PSIC 18 to act as a PSIC 18 module (but mostly to make a size of PSIC 18 as shown herein in which the labels of the parts thereof would be readable in the published version of the patent). That size of 32 could easily be increased to 36 or reduced to 25 in order to have a square PSIC 18 for easier determination of INj values. For purposes at present, a 5×5=25 PSIC 18 (for which the transistor array is shown in
With that issue in mind,
PSIC 18 in
Four different flows of signal are shown, with three paths in each as shown by the small rightward- and upward-pointing arrows contained therein, which three paths are the “Drain Signal Line” (DSL) 28, the “Gate Signal Line” (GSL) 30, and the “Source Signal Line” (SSL) 32, labeled respectively as No. 1 (outgoing upward); clockwise to No. 2 (outgoing to the right); another clockwise, No. 3, (incoming upward); and finally a last clockwise, No. 4 (incoming rightward). Those lines are labeled in both the horizontal and vertical directions as “DSL 28,” “GSL 30,” and “SSL 32,” respectively. In addition, the letters “D,” “G,” and “S” are shown within small circles at or near the respective points of intersection of the horizontal and vertical SLs, at which points those two directions of the SLs are joined.
It is of course essential that the DSL 28, GSL 30, and SSL 32 lines should not come into contact with one another. Although not visible in
The LN 12 is shown just to the left of and slightly below the center of
Besides the top-down method of contacting the PT terminals, the PSIC 18 has a second unusual feature, which is laying out the LNs 12 at an angle to the geometry of the array grid (see
In order to carry out the desired circuit structuring, it is necessary for each OT terminal to be able to connect to all of the terminals of neighboring OTs, both rightward and upward, and to receive that many inward connections as well. As already noted, an exception to that rule is that among all of the basic logic circuits, nowhere were there found either a GA 22 to SO 24 connection or an SO 24 to GA 22 connection, and it may be noticed that those connections are not shown in
The manner in which those connections are separated out, but yet converge into single DR 28, GA 30, and SO 32 SLs, is shown in
For purposes of understanding the operation of these PSICs 18, especially as to the super-scalability, it is important to notice now why it is that the order of the SPTs differs between the horizontal and the vertical versions of the inter-LN 12 lines. As described earlier, the OT was set at an angle so as to permit the signal lines to avoid lying in parallel with any lower components any more than was necessary. That was of course not an entirely successful scheme, but nevertheless there was some minimization of those inter-level capacitive effects. (The sizes of those lines are of course much exaggerated for purposes of easier viewing in the drawing, so the capacitive effects will not in practice be as large as would seem from the drawing.) As it then turned out, that minimization was brought about by having the horizontal lines laid in the order G, D, and S, reading top to bottom, while the vertical lines had the order D, G, and S, reading left to right. It was only in that way that the SLs were able to be fitted past the INj and most of the terminal lines extending therefrom with a minimum of capacitive interaction. The significance of following those orders of arrangement exactly will be seen when it comes to joining one PSIC 18 to another.
In what now follows a number of standard gate circuits will be shown as having been structured by IL methods, together with the code by which the particular circuit was structured. Each such IL structure will be preceded by a drawing of the current equivalent of that IL circuit. An arbitrary placement of the circuits within a gross 10×10 PSIC 18 will be used to illustrate the use of both the LIi and INj values as was noted above, followed by the rest of the code needed to enable the proper LNs 12, CPTs, SPTs 14, and the I/O 16, when used, to form the circuits being structured.
The LIis are the numbers of the LNs 12 as assigned in a left-to-right count and then downward through the rows in the same manner, and the INj are the binary versions of those LIis. The locations of the various circuits within the PSIC 18 will be selected arbitrarily, as will also be the destinations of those circuit outputs. The “iiiiiiiiii” INj value will use a 10-bit code, which of course must be consistent throughout all of the LNs 12 of an algorithm, and all bit sizes of data regardless of the size of any particular number, since otherwise one would not know where the IN code left off and the PT codes began. Only 2-D embodiments will be shown, but just for purposes of illustration the direction code will be of 2 bits so as to accommodate a third dimension.
Now in determining the overall design of an ILA so as to meet some range of IP tasks, these LIi and INj values, as well as that process of determining those values, relate directly to the issues of scalability and super-scalability, since if a second PSIC 18 had had to be added to a first in order to have enough space for some algorithm, then those values for much of the resultant composite PSIC 18 will have to be re-determined, or otherwise neither of those features would be operable and could not be demonstrated. A rapid way of restructuring the values of all of the INj thus becomes quite pertinent.
As will be the case with all of the LN 12-structured circuits to be shown here, only those PTs that have been enabled so as to be a part of the circuit being structured are shown. Unless blocked from so doing, the LIi will be shown to the right of and a bit above the “IN” circle for the LN 12, and the INj will be shown as the leading (most significant bit positions) bits of the full code, which will be located just below the GND symbol, or below the “IN” circle if no GND is shown. (If the indicated space is blocked, that full code will be placed elsewhere, but be very visible in any event, so there should be no difficulty in locating any of these code elements. In the latch circuit shown below, the codes for the top row of LNs 12 are placed above those LNs 12.)
This sequence of circuits will start with the simple “BYPASS gate” (BYP), since it is the most simple possible “circuit” (The BYP, like the inverter, is of course not a “gate” in any strict sense, but is termed one herein simply to have a consistent terminology.) and is quite ubiquitous, with a very useful function to be used in resolving a geometric problem brought about by the inflexible, orthogonal nature of the LN 12 array as seen in
The latch (memory node) will be shown to illustrate that IL serves also to carry out sequential logic. That memory capability could be important in the event that an algorithm was to bring about any instances of data dependence. From this central core of circuits there would seem to be no algorithm for which a binary logic solution could not be developed that would accommodate the IL methodology. Finally, an XOR gate will be shown in such manner as to illustrate both scalability and super-scalability.
Thus,
Again as to the BYP gate specifically, in the horizontal version of
In
The next IL-structured circuit, a BRANCH gate, of which a conventional version is shown in
Starting at the BYP gate at the top and proceeding downward through the PSIC 18, the full code for the 82 LN 12 is generated as being 1010010 for the INj, followed by three “0's for the “ccc” code, a “01” to indicate that the proximal end of the 5 SPT 14 connects to the DR 20 terminal of the 82 LN 12, another “01” to indicate that the 5 SPT 14 extends to the right, and then a “10” to show that the distal end of the 5 SPT 14 connects to the GA 22 terminal of the RLN, whatever it may be. The full code for the 82 LN 12 is thus 1010010000010110, for a total of 16 bits.
The lower LN 12 in
The complete code comes out as 1011100101010110011001 of 22 bits for the BRANCH gate as a whole, and is made by concatenating that for the 92 LN 12 onto the right side of that for the 82 LN 12, reading downward through the PSIC 18. In operation, that code forms a CL, but would also have to include the codes for any other LNs 12 that participated in that same step of that same algorithm. (It does not matter in what order the two SPTs 14 are entered, but of course the six bits that make up a signal code must be entered intact, without mixing any of the code between the two SPTs 14. When there are two SPT 14 codes to be entered, this applicant has made the habit of entering the horizontal code first, and then the vertical. It should be recognized that there can be no more than one SPT 14 connecting out to a particular LN 12, so if there are two codes to be entered, it is evident that one must go out horizontally and the other vertically, so as to go to two different LNs 12.)
The SPTs 14 that are enabled are shown in the BYPASS and BRANCH gates of
The first “real” circuit to be shown here, meaning that a bit is actually to be operated upon rather than just being moved, is the NOT gate, or inverter, which consists of a single LN 12 having an input at the GA 22 terminal thereof and then an output of opposite sense at the DR 20 terminal. Following the conventional inverter shown in
The next circuit is a 2-bit AND gate, the conventional version of which is shown in
The connection between the two LNs 12 is made by an SPT 14 located on the lower LN 12, which has an LIi of 76 (again 10 higher than the LN 12 above) and the INj code of 1001100. The following “ccc” code is “011,” indicating that CPT2 is enabled along with the connection through CPT3 to GND. The signal code has the 13 SPT 14 connected at the proximal end to the DR 20 terminal by a 01 code, then a 10 to indicate the vertical structuring, and then finally a 11 code to accomplish the connection to the SO 24 terminal of the RLN. (It should be understood that these code entries do not merely show the operator what connections are being made, e.g., selection of the 13 SPT 14 physically requires the enabling of that SPT 14, from which the operator is able to see that the SPT 14 in use connects upwardly from the DR 20 terminal; e.g., it would not be possible to select the 13 SPT 14 and then select a 01 code to indicate a horizontal structuring, since it is that 10 for upward structuring that selects the 13 SPT 14 (or the 11 or 12, depending upon which RLN terminal was being selected to be the terminus of the connection.) This is the first IL circuit herein to show an external signal input, which connects to the I/O 16 terminal.
The full code for the 76 LN 12 is shown beneath that for the 66 LN 12, and turns out to be 1001100011011011, again of a 16 bit length. That length was suggested above to be used to show where the code for a first LN 12 has been completed and the second was ready to start. That might be useful for the operator in “proofreading” the code entries, but it is not necessary for operational purposes: with a fixed INj code length of 7 bits the LN 12 code length will be fixed at 16 bits if one SPT 14 is enabled, 22 if two SPTs 14 are enabled on the same LN 12 (there could be no more than two directions from the LN 12), or 32 if, as in this AND gate, two LNs 12 are fully encoded. (In theory, with two LNs 12 and two SPTs on each the CL would be 44 bits long.)
An OR gate is shown next, the conventional version thereof being shown in
The full code for the rightward LN 12 practically writes itself and is 1011001101010110, which comes from INj=1011001, “ccc”=“101,” showing that in the “ccc” code connections are made to Vdd and to GND, and then the signal code is “01” for the proximal end of the SPT 14 being on the DR 20 terminal of the 89 LN 12, a “01” for the rightward direction of the signal, and finally a “10” for the distal end connection to the GA 22 terminal of the RLN. The input to the rightward LN 12 must evidently be coming in from below that LIi=89 LN 12 as shown by the dashed arrow line (i.e., from the 99 LN 12), since the CPT2 on that 89 LN 12 is not enabled for an external input through I/O 16, and the only other source for such signal, which is the leftward 88 LN 12, is already acting as the other input to the 2-bit OR gate. (The notation “Data In/Out” and the two arrows in
The NAND gate will of course simply be an AND gate followed by an inverter, and the same as to an OR gate to form a NOR gate, so it was not deemed necessary to show those circuits in detail, especially since the next “gate” shown, which is a memory latch, will contain two NAND gates. The simple SR (“Set-Reset”) flip-flop will be used as an example of a sequential circuit, so an iconic version of the conventional NAND-based SR flip-flop is shown in
This next IL circuit, the SR flip-flop, presents some interesting opportunities in the means by which the circuit is structured. In the first place, it has been the standard practice in treating this latch to show the two NAND gates as being side-by-side, and as shown in the iconic drawing in
Another consequence of the latch circuitry is that the reversed NAND gate used here requires the circuit structuring to proceed in a direction opposite to that of the signal flow, which of course, with pass transistors being bidirectional, is a perfectly legitimate procedure. In
Returning now to the RS flip-flop of
For access to these quantities, S and R are already on I/O 16 outputs, while Q and Q-bar, having both appeared on a DR 20 terminal at a corner of the circuit, could easily be passed on to a GA 22 terminal of an adjacent LN 12 and then through CPT2 to an I/O 16 site (e.g., through the 5 SPT 14 of the 46 LN 12 to the GA 22 terminal of the 47 LN 12 and then the CPT2 of the 47 LN 12 to connect that “Q” site to the I/O 16 site, and then the 7 SPT 14 of the 53 LN 12 from the GA 22 terminal thereof to the DR 20 terminal of the 54 LN 12, which is the “Q-bar” position. CPT2 of the 53 LN 12 would then en connect that point to I/O 16.).
The detailed description of the RS flip-flop can begin with the 44 LN 12, as the top left LN 12 in
That signal is passed on to the SO 24 terminal of the 45 LN 12, for which LNj=0101101; the “ccc” code is 110, meaning that both CPT1 and CPT2 are enabled, the SPT 14 code is first a 01 for the proximal end of the SPT 14 being on the DR 20 terminal of the 45 LN 12, then another 01 for going rightward, and then a 10 for the distal end of the SPT 14 connecting to the GA 22 terminal of the RLN after passing through the 5 SPT 14, that RLN being the 46 LN 12. Thus, while the 44 LN 12 had the GND but not the Vdd CPT enabled, being the “bottom” LN 12 of the AND gate, the 45 LN 12 has the Vdd but not the GND CPT enabled, as being the “top” LN 12 in the AND gate, the two LNs 12 then being connected in series.
In order to complete the structuring of a NAND gate, that AND gate output must pass through an inverter, which must be the 46 LN 12. Since that LN 12 receives the signal from the 45 LN 12 at its GA 22 terminal, to provide the inverter the 46 LN 12 needs only pass that signal on from its DR 20 terminal, as it indeed then does. For that 46 LN 12. the INj=0101110, ccc=101 for enabling Vdd and GND, and then there is no SPT 14 code, or rather the SPT 14 code is 000000 since connection to that DR 20 must be made from the LN 12 that is below that 46 LN 12, i.e., from the 56 LN 12. The full code then becomes 0101110101000000. This will be the start of a series of reverse coding, in which the SPT 14 to be used is one on the RLN, that then reaches back to the OLN. That connection in this case is through the 14 SPT 14 of the 56 LN 12.
The 56 LN 12 code begins with INj=0111000, followed by ccc=001 to indicate that only the CPT3 to GND is enabled; a 10 code for the SPT 14 that has the proximal end thereof on the GA 22 terminal of that LN 12; then another 10 code indicating that the SPT 14 extends upward, and finally a 01 code for connecting the distal end of the SPT 14 onto the DR 20 terminal of the 56 LN 12. That leads to a full code of 0111000001101001. That connection is the first one of two that connects the output of one NAND gate, in this case from the 46 LN 12 to an input of the other, i.e., the 56 LN 12.
Again going leftward, connection must first be made from the 56 LN 12 back up to the 46 LN 12, which is accomplished by the 14 SPT 14, of which the proximal end connects to the GA 22 terminal of the 56 LN 12 and then extends upward to the DR 20 terminal of the 46 LN 12, thus to form the 46 LN 12 into the inverter that converts the 44-45 AND gate into a NAND gate. Now as to the 56 LN 12 itself, the INj thereof is 0111000, the ccc code is 001, to indicate that only the CPT3 to GND is connected, and then as noted above, the GA22 connection is produced by a 10 code, the upward connection by another 10 code, and the connection to the DR 20 of the 46 LN 12 by a 01 code. The full code of the 56 LN 12 is thus 0111000001101001.
Connection must then be made from the SO 24 terminal of the upper (of the AND gate) 55 LN back to the DR 20 terminal of the lower 56 LN 12 of the AND gate, which would be through the 16 SPT 14 of the 55 LN 12. The arrowheads on the signal line point downward from the 46 to the 56 LN 12, and then leftward from the 56 to the 55 LN 12, and similarly thereafter to the 54 LN 12. The full code of the 55 LN 12 is 0110111110110101, consisting of an INj code of 0110111, ccc=110 to show that CPT1 and CPT2 were enabled, and then 11 for a connection of the proximal end of the SPT 14 to the SO 24 terminal of the 55 LN 12, a 01 code to show that the SPT 14 extends to the right, and then a 01 to reflect the connection of the distal end of the SPT 14 to the DR 20 terminal of the 56 LN 12.**
The 56 and 55 LNs 12 constitute the AND gate, with the 55 LN 12 being the higher of the two (closest to Vdd), and then the 54 LN 12 as an inverter will convert that AND gate into a NAND gate. That means a connection through the 7 SPT 14 from the GA 22 terminal of the 54 LN 12 back to the DR 20 terminal of the 55 LN 12. The location of the 54 LN 12 is given by INj=0110110, and then ccc=101 to show the enabling of CPT1 and CPT3, and then for the 7 SPT 14 code a 10 for the proximal end of the SPT on the GA 22 terminal of the 54 LN 12, a 01 for the rightward direction of the 7 SPT 14, and then another 01 for the connection of the distal end of that 7 SPT 14 to the DR 20 terminal of the 55 LN 12. A second SPT 14 is enabled for the connection of the output of this second NAND gate to an input of the first NAND gate, which is the 12 SPT 14 of the 54 LN 12 connecting at the proximal end thereof to the DR 20 terminal of the 54 LN 12, shown by a 01 code, then a 10 showing the upward direction of the 12 SPT 14, then a 10 to show the connection of the distal end of that 12 SPT 14 to the GA 22 terminal of the 44 LN 12, thus to complete the encoding of this SR flip-flop. As indicated, this lower NAND gate is structured going in the opposite direction to that of the upper NAND gate so as to avoid needing to cross the two output-input connections. This RS flip-flop latch is probably unique in the wide variety of IL encoding practices in being made of so many so many different parts in such a small circuit. (Except possibly for aesthetic reasons, it is not clear why this latch is usually drawn with the two NAND gates being “pointed” in the same direction. In fact, having those gates going in opposite directions and without the crossed wires as done here would be more suggestive of how the device operates, in showing the paired output-to-input relationship more clearly.)
OperationsOne important aspect of IL is its presumed low use of power. In current electronics practice, turning on an instrument essentially involves turning on everything. In IL, by contrast, until an algorithm begins to be executed, at any given time only those transistors that are involved in the one step of the algorithm then being executed will be receiving any power. (Of course, there will be power to the monitor and various peripherals in both cases.) It is as though the algorithm was being executed using only the power of enough LNs 12 to structure the circuitry of a single step. That kind of energy economy should be a boon in those markets that depend upon battery-operated devices such as cell phones, remotes, laptops, etc.
At the high end of the energy consumption scale, there are some circumstances in which the maximum use of as many LNs 12 as possible would be sought, as in the maintenance and interpretation of the Gross National Product data base, the conduct of the census, the IRS files, the FBI files, the Department of Defense, Home Security, the Patent and Trademark Office data bases, the content of the National Archives, etc., or on the scientific side the data pouring out of the Hubble telescope, the data bases of the Center for Disease Control, the Department of Health, or the data pouring out of the Large Hadron Collider at a rate of 15 petabytes per year, etc. There being no limit to the upper size or numbers of the PSICs 18 making up the system for treating any of those or similar massive data handling tasks except for the practical factors of space, time, and money, and with a lower limit of one LN 12, there seems to be no data handling problem now existent that could not be handled by IL.
The small grained character of IL also allows for smaller embodiments thereof, so that where appropriate IL could replace ASICs in many of the embedded types of usage. An advantage as to the personal computer is that it would no longer be necessary to await the loading in of some huge program in order to begin some desired operation—to start an algorithm in IL requires only the encoding of a first small set of the operational elements that will constitute the first step of the algorithm, and indeed as the first cycle of the operation, with the following operational elements being encoded “Just In Time” to execute each next step of the algorithm whence the name “Instant Logic.” (Actually, provided the PS 10 were large enough, nothing would prevent loading all of the CLs of the algorithm at once, but it would be wasteful of energy to do so.) It would seem possible to apply IL to the full range of consumer products, it would also seem that there would be no task now carried out using microprocessors and FPGAs, etc., that could not be carried out using IL. It would only be questionable whether the adoption of IL would be worthwhile as to the many tasks now carried out by the variety of small ASICs now in use. (There is also the question of whether the prospective greater speed of IL procedures is actually needed in many of the ASIC applications—probably so in motor vehicles and elevators, but perhaps not in a washing machine.) Even so, IL may be chosen not for speed gains but for purposes of conserving energy and ease of maintenance.)
On that matter of maintenance, that should be quite an easy task: control of the IL operations would not be spread out through the system, but would generally be local, and if a fault in some step of the process was identified, it is just a matter of replacing the module in which the step takes place as a permanent fix, or on a temporary basis the step operations could easily be sent through a different set of LNs 12, thus to bypass whatever LNs 12 that were not properly functioning, as a matter of re-writing the code (i.e., the relevant code lines).
In conceptual terms, the ILA is quite simple, but in practical terms, it is quite complex, requiring creative encoding in order to get the various algorithms, and steps of algorithms, functioning in a coherent manner, without “collisions” where two or more algorithm steps were vying for the same LN 12. Of course, one simply steers the paths of circuit encoding away from each other, but if there is not sufficient space to do so, the paths of other algorithms might also need to be re-directed, so the process could get complicated. Thus, although the avoidance of collisions is conceptually a simple matter, actually to succeed in doing so could get complicated. The development of Code Modules that do not collide internally, but are yet compact so as not to have “hoarded” too much PS 10 space, is a labor-intensive task.
Of course, cost would remain an issue in any of those contexts, given the complexity of the PSIC 18 and the means for entering code and data into the PSIC 18. However, that IC is substantially less complex than many that can be seen in the literature. Indeed, the highly repetitive nature of the PS circuit augers well for mass production methods, and the simplicity of the IL equivalent of programming would seem to offset any extra cost of the IL PSIC 18, even if there is such an extra cost, which seems doubtful. The “encoders” (those persons who would encode the IL algorithms) would need to wipe from their minds all thoughts of C++, PASCAL, instructions, CPUs, ALUs, networks, loops (but not conditional transfers), pointers, dlls, etc., and think only in terms of what binary gates would be needed to carry out each next step of whatever the algorithm might be. A code module developed for “performance” computers might still be easily transferred to an ASIC context, run through procedures such as that set out in
For that purpose, various code modules that would structure such more complex circuits as ADD, SUBTRACT, MULTIPLY, AND DIVIDE circuits (all of which would require extensive routing of partials sums, etc., to gather the bits together), SORT, SEARCH and SEQUENCE circuits, and circuits for a wide range of mathematical operations, such as MATRIX INVERSION, FFTs, BESSEL FUNCTIONS, MATHIEU FUNCTIONS, hyperbolic cosines, etc., could be prepared in advance, the use of which would only need to be specialized to the particular algorithm by the need to identify the INj values of the LNs 12 from which the data to be processed would be acquired, and as described above, the code module (CM) would then identify the INj values for all of the other LNs 12 that would be used in the algorithm.
Like the capability of executing complex algorithms, the data handling capability of IL is also essentially unlimited, as would be the scope of those SORT, SEARCH and SEQUENCE operations, which means that the data handling requirements of such institutions as the FBI or the USPTO, or even more challenging would be the highly data-dense undertakings such as the Large Hadron Collider, or as to relatively fixed data, indeed the entirety of both the scientific and other types of literature throughout the entire history of mankind, could be stored in a readily searchable and accessible manner, thus to have helped accommodate the “Information Explosion.”
Any projection of all of the tasks to which IL could be applied would be a very questionable enterprise, but the isotropic, continuous array-like construction of the area in which the actual IP takes place suggests even very complicated tasks for which the advantages of IL would very likely be substantial. One such task that is both obvious and in great need for improvement is that of image processing, in which the ability of IL to carry out large numbers of repetitions of some simple task at a very rapid rate would no doubt stand out.
The isotropic, continuous array-like construction of the area in which the actual IP takes place suggests that even very complicated tasks for which the advantages of IL would very likely be substantial. One such task that is both obvious and in great need for improvement is that of image processing, in which the ability of IL to carry out large numbers of repetitions of some simple task at a very rapid rate would no doubt stand out.
Another such application would be that of Artificial Intelligence, for which one can see two different advantages. As to the first of these, the essentially unlimited size of the PS would provide an equally unlimited data-handling capacity, thus to permit identification of relationships between even the most distantly related concepts. (The parent application sets out a data-sorting process (using conventional electronics) in which the depth of the variables addressed (i.e., of the sub-, sub- sub-, etc. variety) can be increased without limit.)
Another advantage of that array-like construction relates to the efforts being carried out in the Artificial Intelligence community to mimic the structure of the brain. As understood by this author, current thought in Artificial Intelligence is that for the most part, even though certain regions of the brain have been identified as carrying out particular tasks (sight, language, etc.), the more abstract mental processes seem to take place by way of synapses that are highly distributed, taking place at locations that are spread out through large areas of the brain. It would seem that presently, the Processing Space of IL could provide the kind of tabula rasa or “blank slate” on which models of that kind of brain function could be examined (which is not, of course, to say that the brain acts on the whole as a tabula rasa, which it most assuredly does not). The brain has an irregular array of cells that have multiple connections to other such cells, while the IL array has a regular array of cells that have multiple connections to other such cells.
Scalability and Super-ScalabilityTurning now specifically to the matter of scalability and super-scalability, this first requires a view of another aspect of the PSIC 18, which is the all-important connection of the internal workings of a PS 10 with the outside world, of which for that purpose a portion thereof is shown in
The forward face of the CuL shows the DSL 28, GSL 30, and SSL 32, as well as the Vdd and GND connections, and the TEAs 58 at the top for all SPT 14 connections for one full PE, and then the DSL 28, GSL 30, Vdd, and three TEAs 58 as a partial representation for a second LN 12. As noted earlier, all of these that appear on the front surface of the integrated circuit chip are caused to protrude a slight bit from the body of the PSIC 18, which is required for the joining of one PSIC 18 to another. The top of the chip is used for connections from the encoding circuitry to the PTs to which code must be sent, and thus constitutes an “Operations” surface of the chip, while the facing front surface of the chip, as seen in
The far side of the PSIC 18 contains an effective “mirror image” of the connections shown in
A conventional iconic form of the XOR gate is shown in
With an x-axis width on PS1 of 20 LNs 12, an LN 12 that was in the second row and in the 18th rightward position would have an LIi of 38 in that original circumstance, which value is shown in
With regard to the third row (the second row of the XOR gate), in the original situation, with no PS2 present, the LIi values of the C, D, and F would have been XM=20 higher than the values of the LN 12 just above each of those LNs 12, i.e., 58, 59, and 60, respectively, as shown in parentheses again just above and to the right of the LN 12, but again with no value being shown on the H LN 12 since not yet present, but then with the addition of PS2 those values would be based upon XM=40 and the higher values of the A, B, E and G values, or 58+40=98 for the C LN 12, then followed by 99, 100, and 101, respectively, for the respective D, F, and H LNs 12, also shown in brackets above the earlier values, if any.
Rather than having to having to pick one's way through these LNs 12, however, it is seen that a pattern seems to have begun to emerge. That is, the LIi values of a lower row are the same as the values of the row above with PS2 added, e.g., the LIi=60 of the F LN 12 without the PS2 is the same as the 60 value of the E LN 12 one row above with PS2. Of course, that is as it should be, since either adding a 20×20 PS2 or moving down a row would each add 20 to the LIi. It then seems possible to enter LIi values by simple inspection of the circuit drawing.
That having been done, it remains only to enter the corresponding INj codes into
One thing that can be noticed immediately is that adding to the size of the PSIC 18 has no effect on the circuit or signal codes, as shown just below the INj codes in
Now as to the XOR circuit itself, as is usually the case, a circuit the size of this XOR gate will be found to be made up of some combination of smaller gates, and such is the case here. The LNs 12 have been assigned letter designations for ease in identifying these sub-circuits, and turn out to be as follows (there are other combinations of such sub-circuits that will also yield an XOR gate): LNs 12 A and B form an OR gate; LNs 12 C and D form an AND gate; the E LN 12 forms a BYPASS gate; the F LN 12 forms a NOT gate; and the G and H LNs 12 form the output AND gate. This circuit demonstrates one vital role for the BYPASS gate in that the outputs of the OR gate and the NOT gate or inverter need to enter the output AND gate as the two inputs thereto, which is to say they must enter the G and H LNs 12 at the same time, but those OR gate and inverter outputs are not in the vertical alignment that such entry into the AND gate requires. The BYPASS gate (BYP) then acts to resolve that problem in geometry by placing the two LNs 12 of the output AND gate in vertical alignment.
The BYP is useful for a lot more than just make possible the completion of the structuring of some circuit, however. We take it that from all of the foregoing that the scalability of the ILA has been shown since, after all, nothing precludes the addition of yet one more PSIC 12 to whatever number may already be at hand. The execution of some operations such as addition leaves a number of scattered partial sums, and as shown in detail in the original application, the BYP can serve to bring those partial sums together, when joined with inverter pairs that will maintain the necessary voltage levels. What
As noted earlier, the “Computing Power” (CP) of an ILA is effectively measured by the size of the PSIC 18 in terms of the number of LNs 12. That is the case because it is only through inter-LN 12 connections that IP can be carried out, and an LN 12 has just two outward connections that can be employed. (Incoming connections are not counted because they already would already be counted as outgoing connections for the leftward and downward neighbor LNs 12, if any.) On the face of it, a 20×20 PSIC 18 would have 20×20×2=800 connections, but there are 20 LNs 12 along each of the side and top of the PSIC 18 that have no neighboring LN 12 to which connection could be made, thus to define a “Periphery Deficit” (PD) for this PSIC 18 of 2×20=40 connections. The resultant connection count is thus 800−40=760 connections, and the ratio of actual connections to that gross count is 760/800=0.95.
With both PS1 and PS2 at hand, with an X width of 40, the CP under the rule of scalability and a gross count would be 2(40×20)=1600. The PD for the PS1/PS2 of
So looking at
It is worth noting that the ability of the ILA to exhibit super-scalability is actually a matter of mathematic (geometric) certainty. The area of a square is d2, where d is the length of a side, and the periphery of the square is 4d, with the periphery/area ratio being given by 4d/d2=4/d. With increasing d, and the periphery being increased linearly, the area will increase geometrically, i.e., as the square of length d. That exact relationship is of course changed with the calculation of a PD, but the underlying relationship will remain essentially the same: the effect of having a periphery (and hence fewer connections) will decrease in proportion to the size as the object (the PSIC 18) gets larger. Doubling the size of the object will give more connections than just twice the original, and a like calculation would be true of any other multiple.
With the joinder, the area of the composite PS1-PS2 PSIC 18 becomes d×2d=20×40=800, or 1600 prospective connections, and the “Peripheral Deficit” (or PD) become 3d=3(20)=60, so the number of actual connections becomes 1600−60=1540, and the ratio of the actual connections to the gross number is 1540/1600=0.9625, as compared to the ratio of 0.95 for the original PS1. It is like blowing up a balloon: the volume of a sphere increases faster than the circumference as the balloon is blown up. (With a PSIC 18 of 1,000 square, the PD would be 2,000 and the gross count would be 2(1,000×1,000)=2,000,000 to yield (2,000,000−2,000)/2,000,000=0.999.)
A measure of the effect of adding the second PSIC 18 to the first can be obtained from a comparison of the ratios of the evident number of connections before and after that addition as to the composite PS1-PS2 PSIC 18, which is 1540/1600=0.9625, and the same for PS1 alone, which is 760/800=0.95, and the ratio of those two is 0.9625/0.95=1.013. In other words, the PS1-PS2 PSIC 18 is 1.3, i.e., about one and a third percent more effective in providing actual connections than is the single PS1, which is the effect of super-scalability. The size of the increase in the connections count is thus relatively small, but even the slightest gain in the connections count suffices to constitute a degree of super-scalability.
The number of LNs 12 in the PS1/PS2 composite is exactly twice the number in the original PS1, so the only way in which the CP could have been increased beyond that doubling is if some LNs 12 in PS1 that could make no connections prior to the joinder of PS2, but had somehow gained the ability to do so when PS2 was added. Such is the case, and as has already been pointed out, the LNs 12 that have “magically” gained that connection power are those along the “Processing Space Cut Line” (PSCL) shown in
As can best be determined by this inventor, the concepts of IL stand poised to bring about a substantial change in the way in which the electronics industry, particularly as to computers, conducts its business. In using the methodology of IL, some real world problem would first have been identified and modeled by an algorithm, then expressed by an electronic circuit (in the present embodiment) made up of a series of properly interconnected binary logic gates. That circuit would then be encoded into a CODE block in memory that when applied would carry out those operations that were needed to execute that algorithm. Each operation would rest on a continuous flow of enabling bits into the PTs, together with a like flow of data bits, either internally or by way of a number of I/Os 16, whether from memory or at some points from a foreign source as to any part of an algorithm that required external data input, from which input a constant flow of output data based on the algorithm and any such input data would emerge. After the first input stage of the algorithm, each step will operate on the data created by the preceding circuits and any added data.
In current practice, an effort has been made in the industry to be “all things to all people,” i.e., to provide in one apparatus what was almost a turnkey machine for a wide range of different end user types, those end users having been provided with any number of complex (yet not particularly effective) programs that take up a lot of space and will often never be used but still had to be paid for. There has thus grown up a “cottage industry”—actually a variety of different cottage industries specializing in different computer operations—of software-based companies that specialize in particular tasks, such as CAD, data bases, word processing, multimedia, and so on, but would soon end up seeing the end user scanning the internet, the Yellow Pages, or the local software outlets looking for that one truly effective program that the end user wishes had been in his computer in the first place.
Instead, one can envision a marketplace in which computers come in a CAD model, a data base model, a word processing model, etc., in which each model comes with a “top of the line” version of a program that treats the particular task at the highest level possible. These different types could also be directed towards particular professions, such as an attorney version, an EE version, a physicist version, etc. There would of course be substantial overlap, but the end user would be invited to select the particular features desired, with some differences between, say, a chemical engineer and a mechanical engineer, particularly in the kind of supporting data bases and mathematical programs that would be included in the different engineering packages.
However, back to IL, one can instead envisage that, first impressions perhaps to the contrary, it would certainly fall within the purview of any competent electronics engineer to absorb the essentials of the IL procedures, and to develop code lines for any algorithm that could be written for some particular task. For that reason, one can also envision an IL product made up of a number of PSIC modules of a convenient size, a Control Cable and wiring harness (see parent application), a Logic Node Locator, and the requisite number of Circuit Code Selectors and Signal Code Selectors for each module, and a tower or laptop with mother board to include all of the above in plug-in form. The end user would construct his own IL computer, or negotiate with a local dealer to have that done, and a computer tailored to the task, unfettered by the presence of myriad programs having no use to the present user but act only to slow down the desired work, would result.
As to what model this IL version would constitute, the answer is: “All of the above.” The end user would know his or her own specialty, what the problems were, what kinds of algorithms could be written and what were the circuits that would be needed to execute those algorithms, what binary logic gate circuits would be needed, etc., etc. Each end user would design his or her own special model of an IL system.
That investment is justified by what has been said herein. If by “speed” one means the throughput, the unlimited scalability of IL leaves a freedom to incorporate in an ILA as many modules of Processing Space as may be required to achieve any given throughput. But as to “speed” in the strict sense (i.e., FPS), with the von Neumann Bottleneck having been removed the successive steps of the algorithm will follow each other without interruptions, cycle by cycle, so except for cases of data dependence there will be no “wait states” during which no IP would be taking place. But absent an unlikely event of any data dependence (the timing of the encoding of the PTs would generally forestall any data dependence), the basic operating speed of IL should be significantly faster than anything now achieved using microprocessor-based apparatus with their inherent inclusion of the von Neumann Bottleneck. In terms of the underlying methodology, this would seem to be the fastest way possible for IP to be carried out, although technologies other than semiconductor electronics will likely end up making the process even faster. As a consequence relative to the commercial and industrial applicability of IL, it would seem that there is little if any doubt that this procedure provides a pathway to a scalable and super-scalable supercomputer that would have as much speed and data-handling capability as might be desired, without limit.
As one particular example in the Artificial Intelligence field, the brain is thought to be an array of multiply interconnected synapses, and the PS of IL (in the example used herein) is an array of multiply interconnected transistors. The IL connections shown herein do not, of course, represent any limit to the kinds of interconnections that could be made, so it seems nothing would prevent there being developed a more complete model of the brain. In terms of natural arrays, such as the various crystal classes, the model used herein is based on the simple cubic structure. Should the problem of off-chip connection to the on-chip PTs that serve to structure the required circuitry be better solved, much more sophisticated designs might well be considered. For example, while this IL model is only a 2-D embodiment taken from the simple cubic crystal class, other crystal structures present 8 or 12 facets, and at least using discrete transistors those geometric structures could be modeled just as well, perhaps thereby to yield better models of the brain.
Whatever may be the validity of the foregoing projections, however, it can be reasonably asserted that the potential scope of IL use extends throughout the full range of current microprocessor or FPGA, etc., usage, and in terms of speed (throughput), volume (how much data could be handled), and precision (how many bits could there be in a number expression) would likely surpass anything that those current devices could accomplish.
IL seems to provide the “breakthrough”—as the “revolutionary” rather than the “evolutionary” advance—that Tosic had sought, and would justify at least the kind of investment that went into the ENIAC. (The ENIAC investment can be seen in the size of the apparatus, that had occupied 1800 ft2 and consumed 174 kW of power, at an approximate cost of the basic system of $750,000 plus approximately $30,000 for magnetic storage, all in 1940's dollars.) Nancy Stern, From ENIAC to UNIVAC: An Appraisal of the Eckert-Mauchly Computers (Digital Press, Digital Equipment Corporation, Bedford, Mass., 1981), p. 51. Another sense of the size of the ENIAC can be extracted from the photographs of the machine on pages 31, 35, 40, and 43, and then a weight of over 30 tons, p. 73. Of course, much of that size comes from using vacuum tube circuitry, but that is still a huge and costly apparatus. An investment of that size compared to the gain achieved suggests that a sufficient investment in IL to produce a fully operational apparatus would also be well justified, looking at the same kinds of parameters. The evolution of the ENIAC to the UNIVAC and onward would no doubt be carried out again, by a like evolution by resolving some of the fairly obvious problems in this current IL model, from this basic IL Apparatus to the more sophisticated apparatus noted above, or such like or even better apparatus.
In short, the principle conclusions derived from the appearance of IL is that the electronics industry can now see new life, with the appearance of new kinds of faster and all-around better products, more easily tailored to the needs of a wide range of new markets, such as to provide a near explosion in new employment opportunities and particularly new intellectual challenges that have not yet been seen. The development, construction, and of beta testing of an IL prototype now stands as a critical step in the future utilization of that technology.
What stands in the way of such an expansion, besides the “NIH” (Not Invented Here) syndrome, is the industry mind set, and the kind of complacency that locks out any real progress. This inventor was told by an upper level manager in one top level company that “we have millions of lines of code written here, and that code will continue to be in use long after you and I are long gone.” It amounts to protecting the investment, even as the return on that investment continues to shrink, and even as the possibility of continued growth is being blocked by the laws of physics that pertain to integrated circuit design, so that all that can be done within the current paradigm, as suggested by Tosic, is to nibble away at the edges.
COMPONENT PLACEMENTS, NUMBERING AND ABBREVIATIONS
- CPT1: DRR to Vdd
- CPT2: GA to I/O
- CPT3: SO to GND
- 10 Processing Space (PS)
- 12 Logic Node (LN)
- 14 Signal Pass Transistor (SPT)
- 16 I/O
- 18 PS Integrated Circuit (PSIC)
- 20 Drain Terminal (DR)
- 22 Gate Terminal (GA)
- 24 Source Terminal (SO)
- 26 Pedestal (PED)
- 28 Drain Signal Line (DTL)
- 30 Gate Signal Line (GTL)
- 32 Source Signal Line
- 34 INj Locator
- 36 LIi Entry
- 38 +/−1 Entry
- 40 XNOR
- 42 Reference Register1 (RR1)
- 44 r1 Entry
- 46 +/−2 Entry
- 48 XNOR
- 50 RR2
- 52 ki Entry
- 54 XM Entry
- 56 INj Output
- 58 Top Entry Aperture
- 60 Contact Pin (CP)
- 62 Contact Aperture
- 64 Bit Entry Rod (BER)
Claims
1. A scalable and super-scalable information processing apparatus, comprising:
- At least a first processing space comprising an array of independently controllable energy transmitting information processing elements that encompass a first area having a first periphery, said information processing elements having access to power means and data input means, with said first area encompassing a planar surface of said at least a first processing space, said at least a first processing space having a pre-determined dimensionality and pressure contact means on each of a number of sides thereof that comport with said dimensionality and are disposed normally to said planar surface, with said information processing elements being disposed in rows and columns within said first area, said disposition of said information processing elements placing said information processing elements in repetitive side-by-side relationships in each said dimension;
- Control means by which code may be entered into said processing spaces to bring about connections between said energy transmitting information processing elements and from said energy transmitting information processing elements to said power means and said data input means;
- Wherein, by the entry of code through said control means, information processing can be rendered possible by accessing said power means and said data input means, and selectively interconnecting said energy transmitting information processing elements in such a number of directions as is defined by said dimensionality of said at least a first processing space;
- Wherein having one said energy transmitting information processing element in a side-by-side relationship with another said energy transmitting information processing element forms a connection possibility, said connection possibilities being limited in number only by the number of said energy transmitting information processing elements being present; with
- The number of separate connection possibilities present within said at least a first processing space establishing the computing power of said information processing apparatus; wherein
- Upon making connection from said power means to said at least a first said energy transmitting information processing element and from said at least a first said energy transmitting information processing element to another said energy transmitting information processing element constitutes an information processing possibility;
- Upon creating at least one said information processing possibility and entering a signal into said information processing possibility through said data input means or from a preceding information processing element, with said signal then passing on from said information processing possibility to at least one more of said energy transmitting information processing elements and eliciting a response from said at least one more energy transmitting information processing element constitutes an information processing event;
- Whereby, through addition to said at least one said information processing element of an arbitrary number of more said information processing elements, said computing power of said at least a first processing space will increase accordingly,
- Thus to exhibit scalability.
2. The information processing apparatus of claim 1; wherein
- Said energy transmitting information processing elements that lie within a row or column bordering said periphery have at least one direction in which no energy transmitting information processing element to which connection could be made is present, and is thus not able to provide a connection opportunity;
- However, adding to said first processing space at least one more said processing space of like structure as said first processing space by interconnection of at least one said energy transmitting information processing element that lies within said periphery of said first processing space to a like said energy transmitting information processing element that is disposed within a periphery of said at least one more processing space will produce a composite processing space having a new area and a new periphery; and
- Upon connecting said at least one more said processing space to said at least a first processing space through said energy transmitting information processing elements that lie within a row or column bordering said periphery and said pressure contact means to form said composite processing space, the number of connection opportunities present in said composite processing space will be greater than the sum of the connection opportunities in said at least a first processing space and said at least one more processing space, and consequently,
- The computing power of said composite processing space will be greater than the sum of the computing powers of said at least a first processing space and said at least one more processing space, thus to exhibit super-scalability.
3. Apparatus for information processing, comprising:
- An array of passive energy transmitting devices, each having a number of connectible terminals thereon disposed along directions as defined by the dimensionality of said array, each of said passive energy transmitting devices being capable of being transformed into a corresponding active energy transmitting device capable of receiving energy packets having information contained therein and performing information processing on said energy packets;
- An array of active energy transmitting devices having proximal and distal ends, said active energy transmitting devices being capable of passing through energy packets upon the imposition thereto of an enabling signal, with said proximal ends of said active energy transmitting devices being connected respectively to different ones of said connectible terminals on said passive energy transmitting devices, and said distal ends of said active energy transmitting devices being connected respectively to:
- An energy source, an entry location for energy packets, an energy sink, and said number of connectible terminals are disposed on at least one other of said passive energy transmitting devices; and
- Addressing means by which enabling signals can be directed to selected ones of said active energy transmitting devices; whereupon
- The imposition of an enabling signal onto one or more of said active energy transmitting devices connected to one or more of said passive energy transmitting devices that await the entry therein of said energy packets will transform said one or more passive energy transmitting devices into corresponding active energy transmitting devices that will perform information processing upon the entry of energy packets into said entry location for energy packets.
4. The information processing apparatus of claim 2 wherein:
- Said energy comprises electronic energy;
- Said energy transmitting information processing elements comprise operational transistors having a number of terminals connected thereto;
- An array of pass transistors that connect respectively between said terminals of said operational transistors and said terminals of at least one other said operational transistor, said power means, and said data input means;
- Said power means comprises connection to Vdd on one side of said operational transistor and to GND on an opposite side of said operational transistor;
- An array of code selectors respectively connected to said pass transistors that are connected to said operational transistor; whereby
- An enabling of selected ones of said pass transistors would cause the structuring of an operable binary logic circuit;
- An entry of a data bit into said operational transistors would bring about an information processing event, whereby
- The full processing of said data would bring about the execution of an algorithm that expressed a particular arithmetical/logical problem.
5. The information processing apparatus of claim 4 further comprising:
- An integrated circuit having terminal lines connected to said operational transistors including circuit lines that connect to said power means and to an input/output terminal and signal lines that connect through pass transistors to terminals of said another said operational transistor;
- Said pressure contact means comprises extensions of said Vdd, GND, and signal lines beyond the periphery of the integrated circuit sufficiently to permit a firm electrical contact to be made between said extensions of a first said integrated circuit and of at least one more integrated circuit;
- Said signal lines render possible the joinder of one said processing space to another said processing space in electrical continuity; thereby To permit the occurrence of information processing events through the joinder of one said at least a first processing space to said at least one more processing space; thereby: To permit the structuring of binary logic circuits both within both said at least a first processing space and said at least one more processing space and within the joinder of said at least a first processing space and said at least one more processing space; thus To permit the structuring of a composite processing space of unlimited size, speed, and data handling capacity, and thereby to exhibit super-scalability.
6. The information processing apparatus of claim 5 wherein
- Said information processing apparatus is fabricated on an integrated circuit chip having four edges; and Each said edge comprises a cut line along which are disposed a row or column of operational transistors designated as peripheral transistors, with pass transistors connected thereto and to said terminals of said at least one more processing space, wherein An enabling of said pass transistors serves both to permit the joinder of one said at least a first processing space and said at least one more processing space, and the structuring of a binary logic circuit that permits the occurrence of an information processing event; thereby To permit the structuring of binary logic circuits both within said processing space and through the joinder of said at least a first processing space and said at least one more processing space; thus To permit the structuring of a composite processing space of unlimited size, speed, and data handling capacity.
7. The super-scalable information processing apparatus of claim 6 wherein said energy is electronic energy, said passive energy transmitting information processing elements are operational transistors, and said active energy transmitting information processing elements are pass transistors.
8. The super-scalable information processing apparatus of claim 7 wherein, by the enabling of selected ones of said connections, binary logic circuits can be structured that are capable of receiving operational data and processing said data such that an algorithm that expresses a particular arithmetic/logical problem can be executed.
9. The super-scalable information processing apparatus of claim 8 wherein processing spaces are formed on separate integrated circuits, and said integrated circuits each comprise a first array of passive transistors and a second array of active transistors connected at proximal ends thereof to respective passive transistors, wherein respective ones of said array of active transistors also connect at distal ends thereof to an energy source, a source of external data input, and an energy sink, respectively, thus to convert said passive transistors into active transistors, and respective ones of said active transistors of said second array serve to make connection between said terminals of at least one said active transistor of at least one processing space and said terminals of another active transistor of at least one more processing space.
10. The super-scalable information processing apparatus of claim 1 further comprising;
- Code controlled switching means within said connections between said one or more of said terminals of one said processing element to one or more said terminals of an adjacent said processing element, whereby; By opening or closing said connections so as to structure said one or more processing elements into a desired circuit or part thereof; and Providing code to said switching means that will bring about such opening or closing thereof, thus to structure desired circuits that are applicable to the execution of one or more algorithms; and Employing data entrance means to enter such data as may be necessary to execute said algorithms, Thus to carry out information processing.
Type: Application
Filed: Jan 7, 2011
Publication Date: Jun 2, 2011
Inventor: William Stuart Lovell (Lincoln City, OR)
Application Number: 12/930,496
International Classification: G06F 15/80 (20060101); G06F 9/30 (20060101);