Patents Assigned to Ab Initio Technology LLC
-
Patent number: 9576028Abstract: In one aspect, in general, a method of generating a dataflow graph representing a database query includes receiving a query plan from a plan generator, the query plan representing operations for executing a database query on at least one input representing a source of data, producing a dataflow graph from the query plan, wherein the dataflow graph includes at least one node that represents at least one operation represented by the query plan, and includes at least one link that represents at least one dataflow associated with the query plan, and altering one or more components of the dataflow graph based on at least one characteristic of the at least one input representing the source of data.Type: GrantFiled: February 23, 2015Date of Patent: February 21, 2017Assignee: Ab Initio Technology LLCInventors: Ian Schechter, Glenn John Allin
-
Patent number: 9569189Abstract: An approach to automatically specifying, or assisting with the specification of, a parallel computation graph involves determining data processing characteristics of the linking elements that couple data processing elements of the graph. The characteristics of the linking elements are determined according to the characteristics of the upstream and/or downstream data processing elements associated with the linking element, for example, to enable computation by the parallel computation graph that is equivalent to computation of an associated serial graph.Type: GrantFiled: November 14, 2011Date of Patent: February 14, 2017Assignee: Ab Initio Technology LLCInventor: Craig W. Stanfill
-
Patent number: 9569528Abstract: Among other aspects disclosed are a method and system for detecting confidential information. The method includes reading stored data and identifying strings within the stored data, where each string includes a sequence of consecutive bytes which all have values that are in a predetermined subset of possible values. For each of at least some of the strings, determining if the string includes bytes representing one or more format matches, wherein a format match includes a set of values that match a predetermined format associated with confidential information. For each format match, testing the values that match the predetermined format with a set of rules associated with the confidential information to determine whether the format match is an invalid format match that includes one or more invalid values and calculating a score for the stored data, based at least in part upon the ratio of a count of invalid format matches to a count of other format matches.Type: GrantFiled: October 3, 2008Date of Patent: February 14, 2017Assignee: Ab Initio Technology LLCInventor: David Fournier
-
Patent number: 9563411Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for flow analysis. In one aspect, a method includes modifying a dataflow graph, the dataflow graph including a plurality of paths connecting at least one entry point and at least one exit point, including adding components to the dataflow graph that add flow units to data records and remove flow units from data records, each flow unit identifying a segment of a path traversed by the data record. The method also includes identifying execution paths based on flow units obtained by processing a plurality of data records using the modified dataflow graph. The method also includes determining a subset of the plurality of data records, wherein a selected set of execution paths are represented by the subset.Type: GrantFiled: January 5, 2012Date of Patent: February 7, 2017Assignee: Ab Initio Technology LLCInventor: Andrew F. Roberts
-
Patent number: 9563721Abstract: In one aspect, in general, a method is described for managing an archive for determining approximate matches associated with strings occurring in records. The method includes: processing records to determine a set of string representations that correspond to strings occurring in the records; generating, for each of at least some of the string representations in the set, a plurality of close representations that are each generated from at least some of the same characters in the string; and storing entries in the archive that each represent a potential approximate match between at least two strings based on their respective close representations.Type: GrantFiled: July 7, 2014Date of Patent: February 7, 2017Assignee: Ab Initio Technology LLCInventor: Arlen Anderson
-
Patent number: 9547638Abstract: At least one rule specification is received for a graph-based computation having data processing components connected by linking elements representing data flows. The rule specification defines rules that are each associated with one or more rule cases that specify criteria for determining one or more output values that depend on input data. A transform is generated for at least one data processing component in the graph-based computation based on the received rule specification, including providing an interface for configuring characteristics of a log associated with the generated transform. At least one data flow is transformed using the generated transform, including: tracing execution of the data processing components in the graph-based computation at run time, generating log information based on the traced execution according to the configured log characteristics, and storing or outputting the generated log information.Type: GrantFiled: June 30, 2009Date of Patent: January 17, 2017Assignee: Ab Initio Technology LLCInventors: Scott Studer, Joel Gould, David Phillimore
-
Patent number: 9507682Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic graph performance monitoring. One of the methods includes receiving multiple units of work that each include one or more work elements. The method includes determining a characteristic of the first unit of work. The method includes identifying, by a component of the first dataflow graph, a second dataflow graph from multiple available dataflow graphs based on the determined characteristic, the multiple available dataflow graphs being stored in a data storage system. The method includes processing the first unit of work using the second dataflow graph. The method includes determining one or more performance metrics associated with the processing.Type: GrantFiled: November 16, 2012Date of Patent: November 29, 2016Assignee: Ab Initio Technology LLCInventors: Mark Buxbaum, Michael G. Mulligan, Tim Wakeling, Matthew Darcy Atterbury
-
Patent number: 9477786Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for metadata management. One of the methods includes receiving user input selecting a first node. The method includes receiving a first data lineage of a first object, the first object having a type, the first data lineage describing relationships between the first object and one or more datasets or transforms. The method includes receiving user input selecting a second node. The method includes receiving a second data lineage of a second object, the second object having the same type as the first object. The method includes performing a comparison of the first node and the first data lineage to the second node and the second data lineage. The method includes generating a report based on the comparison.Type: GrantFiled: March 13, 2014Date of Patent: October 25, 2016Assignee: Ab Initio Technology LLCInventors: Gregg Yost, Dusan Radivojevic
-
Patent number: 9449057Abstract: A data storage system stores at least one dataset including a plurality of records. A data processing system, coupled to the data storage system, processes the plurality of records to produce codes representing data patterns in the records, the processing including: for each of multiple records in the plurality of records, associating with the record a code encoding one or more elements, wherein each element represents a state or property of a corresponding field or combination of fields as one of a set of element values, and, for at least one element of at least a first code, the number of element values in the set is smaller than the total number of data values that occur in the corresponding field or combination of fields over all of the plurality of records in the dataset.Type: GrantFiled: January 27, 2012Date of Patent: September 20, 2016Assignee: Ab Initio Technology LLCInventor: Arlen Anderson
-
Publication number: 20160248643Abstract: A method for supporting communication between a client and a server includes receiving a first message from a client. The method also includes creating an object in response to the first message. The method also includes sending a response to the first message to the client. The method also includes receiving changes to the object from a server. The method also includes storing the changes to the object. The method also includes receiving a second message from the client. The method also includes sending the stored changes to the client with a response to the second message.Type: ApplicationFiled: March 11, 2016Publication date: August 25, 2016Applicant: Ab Initio Technology LLCInventors: Jennifer M. Farver, Joshua Goldshlag, David W. Parmenter, Ian Robert Schechter, Tim Wakeling
-
Patent number: 9418095Abstract: Managing changes to a collection of records includes storing a first set of records in a data storage system, the first set of records representing a first version of the collection of records, and validating a proposed change to the collection of records specified by an input received over a user interface. The data storage system is queried based on validation criteria associated with the proposed change, and a first result is received in response to the querying. A second set of records is processed representing changes not yet applied to the collection of records to generate a second result. The first result is updated based on the second result to generate a third result. The third result is processed to determine whether the proposed change is valid according to the validation criteria.Type: GrantFiled: January 13, 2012Date of Patent: August 16, 2016Assignee: Ab Initio Technology LLCInventors: Joel Gould, Timothy Perkins, Adam Weiss
-
Patent number: 9413542Abstract: Managing data units broadcast from a data feed, without requiring re-transmission by a source of the data feed, includes: at a first node in a network, receiving at least a portion of a data feed including a plurality of data units; at a second node in the network, receiving at least a portion of the data feed; identifying an interruption in receiving the data feed at the first node; determining an extent of a data lacuna extending between a last data unit received by the first node prior to the interruption and a first data unit received by the first node after the interruption; and sending a request from the first node for results saved by the second node, the results saved by the second node corresponding to the data lacuna.Type: GrantFiled: August 7, 2014Date of Patent: August 9, 2016Assignee: Ab Initio Technology LLCInventor: Craig W. Stanfill
-
Patent number: 9411712Abstract: Generating test data includes: reading values occurring in at least one field of multiple records from a data source; storing profile information including statistics characterizing the values; generating a model of a probability distribution for the field based on the statistics; generating multiple test data values using the generated model such that a frequency at which a given value occurs in the test data values corresponds to a probability assigned to that given value by the model; and storing a collection of test data including the test data values in a data storage system.Type: GrantFiled: June 9, 2010Date of Patent: August 9, 2016Assignee: Ab Initio Technology LLCInventor: Carl Richard Feynman
-
Patent number: 9411531Abstract: Processing a plurality of data units to generate result information, includes: performing a data operation for each data unit of a first subset of data units from the plurality of data units, and storing information associated with a result of the data operation in a first set of one or more data structures stored in working memory space of a memory device; after an overflow condition on the working memory space is satisfied, storing information in overflow storage space of a storage device; and repeating an overflow processing procedure multiple times during the processing of the plurality of data units, the overflow processing procedure including: updating a new set of one or more data structures stored in the working memory space using at least some information stored in the overflow storage space.Type: GrantFiled: December 9, 2015Date of Patent: August 9, 2016Assignee: Ab Initio Technology LLCInventors: Muhammad Arshad Khan, Stephen G. Rybicki, Joel Gould
-
Publication number: 20160188442Abstract: Among other things, a method includes, at a computer system on which one or more computer programs are executing, receiving a specification defining types of state information, receiving an indication that an event associated with at least one of the computer programs has occurred, the event associated with execution of a function of the computer program, collecting state information describing the state of the execution of the computer program when the event occurred, generating an entry corresponding to the event, the entry including elements of the collected state information, the elements of state information formatted according to the specification, and storing the entry. The log can be parsed to generate a visualization of computer program execution.Type: ApplicationFiled: March 9, 2016Publication date: June 30, 2016Applicant: Ab Initio Technology LLCInventors: Joseph Stuart Wood, Robert Freundlich
-
Patent number: 9361355Abstract: Received data records, each including one or more values in one or more fields, are processed to identify a matched data cluster. The processing includes: for selected data records, generating a query from one or more values; identifying one or more candidate data records from the received data records using the query; determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing the candidate records; and selecting the matched data cluster from among one or more candidate data clusters based at least in part on a growth criterion for the candidate data clusters, or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters or based on a result of the growth criterion.Type: GrantFiled: November 15, 2012Date of Patent: June 7, 2016Assignee: Ab Initio Technology LLCInventors: Arlen Anderson, Kamil Trojan
-
Patent number: 9354981Abstract: A memory module stores working data that includes data units. A storage system stores recovery data that includes sets of one or more data units. Transferring data units between the memory module and the storage system includes: maintaining an order among the data units included in the working data, the order defining a first contiguous portion and a second contiguous portion; and, for each of multiple time intervals, identifying any data units accessed from the working data during the time interval, and adding to the recovery data a set of two or more data units including: one or more data units from the first contiguous portion including any accessed data units, and one or more data units from the second contiguous portion including at least one data unit that has been previously added to the recovery data.Type: GrantFiled: September 26, 2014Date of Patent: May 31, 2016Assignee: Ab Initio Technology LLCInventor: Joseph Skeffington Wholey, III
-
Publication number: 20160125197Abstract: A method includes automatically determining a component of a security label for each first record in a first table of a database having multiple tables, including: identifying a second record related to the first record according to a foreign key relationship; identifying a component of the security label for the second record; and assigning a value for the component of the security label for the first record based on the identified component of the security label for the second record. The method includes storing the determined security label in the record.Type: ApplicationFiled: November 5, 2015Publication date: May 5, 2016Applicant: Ab Initio Technology LLCInventor: Christopher J. Winters
-
Patent number: 9323824Abstract: An interface is provided on a computing device for interacting with data stored in a data repository. Input is received including information identifying two or more attributes, and information indicating an order for the identified attributes. A hierarchical data structure is stored, with an order of hierarchy levels corresponding to the indicated order. Multiple attribute values for the attributes are determined. The method includes assigning to each node of a first level at least one of the attribute values of a first attribute, and assigning to each node of a second level at least one of the attribute values of a second attribute, each of the nodes of the second level also being assigned respective ones of the attribute values assigned to one or more nodes of preceding levels. The interface is displayed including displaying interface elements associated with each of the nodes.Type: GrantFiled: September 12, 2011Date of Patent: April 26, 2016Assignee: Ab Initio Technology LLCInventor: Joyce L. Vigneau
-
Patent number: 9323749Abstract: Profiling data includes processing an accessed collection of records, including: generating, for a first set of distinct values appearing in a first set of one or more fields, corresponding location information; generating, for the first set of fields, a corresponding list of entries identifying a distinct value from the first set of distinct values and the location information for the distinct value; generating, for a second set of one or more fields, a corresponding list of entries, with each entry identifying a distinct value from a second set of distinct values appearing in the second set of fields; and generating result information, based at least in part on: locating at least one record of the collection using the location information for at least one value appearing in the first set of fields, and determining at least one value appearing in the second set of fields of the located record.Type: GrantFiled: October 22, 2013Date of Patent: April 26, 2016Assignee: Ab Initio Technology LLCInventor: Arlen Anderson