Patents Assigned to SAS Institute
  • Publication number: 20150029213
    Abstract: A method of visualizing high-cardinally data is provided. A graph is presented on a display. The graph includes a first axis, a second axis, and a plurality of value markers. The first axis includes a minimum value and a maximum value and the second axis includes a plurality of category values. A selection indicator identifying selection of a first value marker of the plurality of value markers is received. The first value marker indicates a value for a category value of the plurality of category values. A second plurality of category values is determined based on the category value. The graph and a second graph are presented on the display. The second graph includes a third axis, a fourth axis, and a second plurality of value markers. The third axis includes a second minimum value and a second maximum value.
    Type: Application
    Filed: March 12, 2014
    Publication date: January 29, 2015
    Applicant: SAS Institute Inc.
    Inventors: Jordan Riley Benson, David J. Caira, Douglas R. Dotson, Lisa Hope Everdyke, Nascif A. Abousalh-Neto
  • Publication number: 20150019554
    Abstract: A method of determining a number of clusters for a dataset is provided. Centroid locations for a defined number of clusters are determined using a clustering algorithm. Boundaries for each of the defined clusters are defined. A reference distribution that includes a plurality of data points is created. The plurality of data points are within the defined boundary of at least one cluster of the defined clusters. Second centroid locations for the defined number of clusters are determined using the clustering algorithm and the reference distribution. A gap statistic for the defined number of clusters based on a comparison between a first residual sum of squares and a second residual sum of squares is computed. The processing is repeated for a next number of clusters to create. An estimated best number of clusters for the received data is determined by comparing the gap statistic computed for each iteration of the number of clusters.
    Type: Application
    Filed: March 4, 2014
    Publication date: January 15, 2015
    Applicant: SAS Institute Inc.
    Inventors: Patrick Hall, Ilknur Kaynar Kabul, Warren Sarle, Jorge Silva
  • Patent number: 8918410
    Abstract: Systems and methods are provided for identifying data variable rules during initial data exploration. In one example, a computer-implemented method of determining a role for a data variable is disclosed. The method comprises identifying to a plurality of data nodes a set of data records containing data values assigned to each data node, a maximum number of levels to record in a sorted data structure at the data nodes, and the data node responsible for each of a plurality of variables. The method further comprises receiving for each variable from the data node responsible for the variable a plurality of unique data values for the variable, a count for each of the unique data values and an overflow count for the variable, wherein the number of unique data values does not exceed the maximum number of levels. A role for a variable can be determined based upon the unique data values, counts and overflow count for the variable.
    Type: Grant
    Filed: February 21, 2013
    Date of Patent: December 23, 2014
    Assignee: SAS Institute Inc.
    Inventors: Georges H. Guirguis, Scott Pope
  • Publication number: 20140372090
    Abstract: A method of selecting a one-class support vector machine (SVM) model for incremental response modeling is provided. Exposure group data generated from first responses by an exposure group receiving a request to respond is received. Control group data generated from second responses by a control group not receiving the request to respond is received. A response is either positive or negative. A one-class SVM model is defined using the positive responses in the control group data and an upper bound parameter value. The defined one-class SVM model is executed with the identified positive responses from the exposure group data. An error value is determined based on execution of the defined one-class SVM model. A final one-class SVM model is selected by validating the defined one-class SVM model using the determined error value.
    Type: Application
    Filed: March 6, 2014
    Publication date: December 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Taiyeong Lee, Ruiwen Zhang, Yongqiao Xiao, Jared Langford Dean
  • Publication number: 20140351196
    Abstract: Systems and methods for determining an optimal splitting scheme for a node in a classification decision tree. A computing system may receive input data related to a decision tree to be generated from a data set. The input data identifies a target attribute of the data set and a set of candidate attributes of the data set to be used as nodes in the decision tree. The computing system may determine, using a clustering algorithm and the set of candidate attributes, a number of potential splitting schemes to be used to split a node in the decision tree. The computing system may calculate a splitting measurement for each of the plurality of potential splitting schemes. The computing system may select an optimal splitting scheme from the plurality of potential splitting schemes for each node in the decision tree based on the splitting measurement.
    Type: Application
    Filed: May 21, 2014
    Publication date: November 27, 2014
    Applicant: SAS Institute Inc.
    Inventors: Xiangqian Hu, Xunlei Wu, Xiangxiang Meng, Oliver Schabenberger
  • Patent number: 8887128
    Abstract: Systems and methods are provided for automated generation of a customized software product. A system includes a computer-readable medium encoded with a project parameters data structure, where the project parameters data structure includes a plurality of project requirement records, and a project prototype. One or more data processors are configured to process a plurality of initial characteristics for the customized software product, populate the project parameters data structure at least based on the initial characteristics, and generate the project prototype based on the project parameters data structure. The one or more data processors are further configured to output a requirements matrix data structure at least based on the project parameters data structure and the project prototype and to generate the customized software product at least based on the requirements matrix data structure and the project prototype.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 11, 2014
    Assignee: SAS Institute Inc.
    Inventors: Brian Oneal Miles, Julius Alton King, Adheesha Sanjaya Arangala, Jin-Whan Jung
  • Publication number: 20140330536
    Abstract: Techniques to simulate statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, where statistics of the statistical test are based on parameter vectors to follow a probability distribution, a statistic simulator component to generate statistics for the parameter vectors from the simulated data, each parameter vector represented with a single point in a grid of points, the statistic simulation component to distribute portions of the simulated data or simulated statistics across multiple nodes of a distributed computing system in accordance with a column-wise or column-wise-by-group distribution algorithm, and a code generator component to create a computational representation arranged to generate an approximate probability distribution for each point in the grid of points from the simulated statistics, the approximate probability distribution to comprise an empirical cumulative distribution function (CDF).
    Type: Application
    Filed: May 6, 2014
    Publication date: November 6, 2014
    Applicant: SAS Institute Inc.
    Inventors: Xilong Chen, Mark Roland Little
  • Publication number: 20140330826
    Abstract: Systems and methods for data reduction of a data set are included. A computing system may group data points in a data set into a number of data point bubbles represented by a number of representative points. A data point bubble may include a one or more data points from the data set and a representative point from the data set. The computing system may calculate a cluster assignment for the representative point by executing a clustering algorithm using the number of representative points.
    Type: Application
    Filed: May 5, 2014
    Publication date: November 6, 2014
    Applicant: SAS Institute Inc.
    Inventors: Xiangqian Hu, Xunlei Wu, Xiangxiang Meng, Oliver Schabenberger
  • Publication number: 20140330884
    Abstract: This disclosure describes methods, systems, computer-readable media, and apparatuses for calculating a summary statistic. Calculating the summary statistic can be performed by identifying multiple subsets of a set of variable observations and assigning the subsets to grid-computing devices such that no two of the subsets are assigned to a same one of the grid-computing devices. A parallel processing operation that involves multiple processing phases at each of the grid-computing devices is then coordinated. The parallel processing operation includes each of the grid-computing devices inventorying the respectively assigned subset and generating inventory information representative of the respectively assigned subset. Subsequently, the inventory information generated by the grid-computing devices is received, and a summary statistic is determined by synthesizing the received inventory information.
    Type: Application
    Filed: May 5, 2014
    Publication date: November 6, 2014
    Applicant: SAS Institute Inc.
    Inventor: Gang Meng
  • Publication number: 20140324738
    Abstract: A computer-program causing a computing device to perform an association measurement between a target variable and each non-target variable of a data set; select non-target variables for inclusion in a visualization based on the degree of association; perform correspondence analysis between target values of the target variable and non-target values of each selected non-target variable; order target value markers within a target row based on the degrees of closeness; order non-target value markers within each non-target row based on the degrees of closeness; determine a width of each target value marker based on a frequency of occurrence of its target value in the data set; determine a width of each non-target value marker based on a frequency of occurrence of its non-target value in the data set; and cause generation of the visualization with connection markers emanating from the target value markers and extending among the non-target value markers.
    Type: Application
    Filed: April 24, 2014
    Publication date: October 30, 2014
    Applicant: SAS Institute Inc.
    Inventors: Krishnan PR, Prasad Pawar
  • Publication number: 20140324762
    Abstract: A method of determining a false and/or a true positive rate is provided. A true count value and a false count value are initialized for probability bins. For a plurality of records, a truth of event occurrence and a probability of occurrence are read; a probability bin that includes the probability of occurrence is determined; the true count value of the determined probability bin is incremented when the truth of event occurrence indicates true; and the false count value of the determined probability bin is incremented when the truth of event occurrence indicates false. A true positive rate and a false positive rate are computed for each probability bin based on the true count value, the false count value, a determined total number of true event occurrences, and a determined total number of false event occurrences.
    Type: Application
    Filed: March 13, 2014
    Publication date: October 30, 2014
    Applicant: SAS Institute Inc.
    Inventor: Lawrence E. Lewis
  • Publication number: 20140297997
    Abstract: Various embodiments are generally directed to techniques for reducing syntax requirements in application code to cause concurrent execution of multiple iterations of at least a portion of a loop thereof to reduce overall execution time in solving a large scale problem. At least one non-transitory machine-readable storage medium includes instructions that when executed by a computing device, cause the computing device to parse an application code to identify a loop instruction indicative of an instruction block that includes instructions that define a loop of which multiple iterations are capable of concurrent execution, the instructions including at least one call instruction to an executable routine capable of concurrent execution; and insert at least one coordinating instruction into an instruction sub-block of the instruction block to cause sequential execution of instructions of the instruction sub-block across the multiple iterations based on identification of the loop instruction.
    Type: Application
    Filed: December 30, 2013
    Publication date: October 2, 2014
    Applicant: SAS Institute Inc.
    Inventors: Jack Joseph Rouse, Leonardo Bezerra Lopes, Robert William Pratt
  • Publication number: 20140280220
    Abstract: A method of determining a storage device on which to store received data is provided. Data is received. A score indicating a value associated with the received data is computed. A storage device is determined from a plurality of types of storage devices on which to store the received data based on the computed score. The received data is sent to the determined storage device.
    Type: Application
    Filed: March 12, 2014
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Gary Spakes, Scott Meredith Chastain, Bryan Christopher Harris
  • Publication number: 20140280343
    Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are encoded and subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are encoded and subdivided into a second sliding series of the defined number of characters. Whether or not the first sliding series and the second sliding series are similar is determined by comparing the encoded and subdivided first characters to the encoded and subdivided second characters using a fuzzy matching algorithm.
    Type: Application
    Filed: September 3, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: James Edward Georges, David Lee Kuhn, Edward Lew Rowe, John Michael Kichak, Karcsi Fritz Lehr
  • Publication number: 20140280331
    Abstract: A method of performing a query on a cube of data is provided. An access key associated with a user is created at a computing device. The access key defines the user's access to a cube of data distributed onto a plurality of computing devices with each computing device of the plurality of computing devices storing a different portion of the cube of data. A plurality of access masks is stored in association with the portion of the cube of data stored on the computing device. A process space associated with the user is created. A query on the cube of data is received by the computing device. The query is associated with the user. The query is processed while masking the created access key with the stored plurality of access masks, wherein the masking controls access to the stored portion of the cube of data. A result of the processed query is sent to a requesting computing device.
    Type: Application
    Filed: August 6, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Stacey M. Christian, Donald James Erdman
  • Publication number: 20140280239
    Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are subdivided into a second sliding series of the defined number of characters. A similarity score between the first sliding series and the second sliding series is calculated. Whether or not the first sliding series and the second sliding series are similar is determined based on the calculated similarity score.
    Type: Application
    Filed: August 8, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: James Edward Georges, David Lee Kuhn, Edward Lew Rowe, John Michael Kichak, Karcsi Fritz Lehr
  • Publication number: 20140280986
    Abstract: A method of acknowledging receipt of an event block object is provided. First connection information for connecting to an event stream processing (ESP) engine executing at a first computing device is received. A first connection to the ESP engine is established using the received first connection information. Second connection information for connecting to a publishing client executing at a second computing device is received. A second connection to the publishing client is established using the received second connection information, wherein the first connection differs from the second connection. An event block object is received from the ESP engine using the established first connection, wherein the event block object includes a unique identifier for the event block object. Successful processing of the event block object is determined.
    Type: Application
    Filed: August 2, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Gerald Donald Baulier, Scott J. Kolodzieski, Vincent L. Deters
  • Publication number: 20140280330
    Abstract: A method of performing a query on a cube of data is provided. A cube of data is distributed onto a plurality of computing devices with each computing device of the plurality of computing devices storing a different portion of the cube of data. A perturbation rule configured for application to the cube of data and associated with a user is received. A process space associated with the user is created. The received perturbation rule is compiled in association with the created process space. A query on the portion of the cube of data stored at the computing device is received. The received query is associated with the created process space. The query is processed while applying the compiled perturbation rule to data extracted from the portion of the cube of data stored at the computing device. A result of the processed query is sent to a requesting computing device.
    Type: Application
    Filed: July 24, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Stacey M. Christian, Donald James Erdman, Scott T. Gray
  • Publication number: 20140282246
    Abstract: Various embodiments are generally directed to techniques for increasing the accuracy with which list items may be selected on a touch screen. A machine-readable storage medium includes instructions that when executed cause a computing device to present a list of multiple list items on a touch screen, each associated with a touch area and including a wide area marking a location of the touch area and a narrow area narrower than the wide area, the wide and narrow areas defining a presentation area wherein the wide areas of adjacent first and second list items are positioned at different first and second widthwise positions, respectively, and wherein the touch areas of the first and second list items coincide with the wide areas of the first and second list items, respectively. Other embodiments are described and claimed.
    Type: Application
    Filed: February 11, 2014
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Qing Gong, Huifang Wang
  • Publication number: 20140279527
    Abstract: Methods, systems, computer-readable media, and apparatuses for detecting unauthorized activity are disclosed. Detecting unauthorized activity is done by accessing first data that represents activity involving a first service provided to a customer, accessing second data that represents activity involving a second service provided to a customer. The activity involving the second service and the activity involving the first service both include authorized customer activity, and the activity associated with the second service further includes unauthorized activity. The first data is filtered using a filtering criteria and a portion of the first data is selected to be retained. The second data and the retained portion of the first data are analyzed, and the analysis includes classifying the activity associated with the second service in a way that distinguishes the unauthorized activity from the authorized activity associated with the second service.
    Type: Application
    Filed: October 24, 2013
    Publication date: September 18, 2014
    Applicant: SAS Institute Inc.
    Inventors: Brian Lee Duke, Paul C. Dulany, Vijay Desai, Kannan Shashank Shah