Data Indexing; Abstracting; Data Reduction (epo) Patents (Class 707/E17.002)
  • Publication number: 20130311435
    Abstract: A method, computer product, and computer system of minimizing surprisal data comprising: at a source, reading and identifying characteristics of a genetic sequence of an organism; receiving an input of rank of at least two identified characteristics of the genetic sequence of the organism; generating a hierarchy of ranked, identified characteristics based on the rank of the at least two identified characteristics of the genetic sequence of the organism; comparing the hierarchy of ranked, identified characteristics to a repository of reference genomes; and if at least one reference genome from the repository matches the hierarchy of ranked, identified characteristics, breaking the matched reference genomes into pieces, combining pieces associated with the identified characteristics from at least one matched reference genome to form a filter pattern to be compared to the nucleotides of the genetic sequence of the organism, to obtain differences and create surprisal data.
    Type: Application
    Filed: June 8, 2012
    Publication date: November 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert R. Friedlander, James R. Kraemer
  • Publication number: 20130311428
    Abstract: Embodiments are directed towards managing within a cluster environment having a plurality of indexers for data storage using redundancy the data being managed using a generation identifier, such that a primary indexer is designated for a given generation of data. When a master device for the cluster fails, data may continue to be stored using redundancy, and data searches performed may still be performed.
    Type: Application
    Filed: October 26, 2012
    Publication date: November 21, 2013
    Applicant: SPLUNK INC.
    Inventors: Vishal Patel, Mitchell Neuman Blank, JR., Sundar Rengarajan Vasan, Stephen Phillip Sorkin
  • Publication number: 20130311427
    Abstract: Embodiments are directed towards managing within a cluster environment having a plurality of indexers for data storage using redundancy the data being managed using a generation identifier, such that a primary indexer is designated for a given generation of data. When a master device for the cluster fails, data may continue to be stored using redundancy, and data searches performed may still be performed.
    Type: Application
    Filed: October 9, 2012
    Publication date: November 21, 2013
    Applicant: SPLUNK INC.
    Inventors: Vishal Patel, Mitchell Neuman Blank, JR., Sundar Rengarajan Vasan, Stephen Phillip Sorkin
  • Publication number: 20130304702
    Abstract: A method, system and computer program product for controlling enterprise data on mobile devices. Data on a mobile device is tagged as being associated with either enterprise data or with personal data. Upon identifying the storage location of the tagged data and the identifier of the application that generated the tagged data, the tag, the storage location of the tagged data and the identifier of the application are stored in an index. A mobile agent residing on the mobile device may be directed by a mobile device management server of the enterprise to perform various actions (e.g., deleting, encrypting, backing-up) on the enterprise data using the index. In this manner, the enterprise has the ability to control their applications and data that resides on employees' mobile devices to ensure that such data is not lost or used in a manner that is contrary to the wishes of the employer.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shalini Kapoor, Palanivel A. Kodeswaran, Sridhar R. Muppidi, Nataraj Nagaratnam, Vikrant Nandakumar
  • Publication number: 20130297613
    Abstract: The present invention is a fast indexing technique that builds an indexing structure based on multi-level key ranges typically for large data storage systems. The invention is explained based on the B+-tree. It is designed to reside in main memory. Point searches and range searches are helped by early termination of searches for non-existent data. Range searches can be processed depth-first or breath-first. One group of multiple searches can be processed with one pass on the indexing structure to minimize total cost. Implementation options and strategies are explained to show the flexibility of this invention for easy adaption and high efficiency. Each branch of any level has exact and clear key boundaries, so that it is very easy to build or cache partial index for various purposes. The inventive indexing structure can be tuned to speed up queries directed at popular ranges of index or index ranges of particular interest to the user.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Applicant: MONMOUTH UNIVERSITY
    Inventor: Cui Yu
  • Publication number: 20130297573
    Abstract: A system, method, and computer program product for character data compression for reducing data storage requirements in a database system are described. Embodiments include identifying data of a particular character type in a full data page, and identifying usage frequency of each character of the particular character type. Each character is encoded based on the identified usage frequency and stored, with storage requirements for most frequently used characters are reduced.
    Type: Application
    Filed: May 7, 2012
    Publication date: November 7, 2013
    Applicant: Sybase Inc.
    Inventors: Xu-dong QIAN, ZhiPing Xiong
  • Publication number: 20130290253
    Abstract: A data indexing system including a plurality of servers and a tracked resource set client is provided. Each of the servers include a plurality of resources that are part of a resource set. Each of the servers also include a tracked resource set corresponding to the resource set. The tracked resource set describes the plurality of resources located in the resource set. The server identifies the plurality of resources using rules of linked data. The tracked resource set client is in communication with the plurality of servers. The tracked resource set client has a data index. The data index is built and kept up to date using the tracked resource set of each of the plurality of servers.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Frank J. Budinsky, James J. Des Rivieres, Martin P. Nally
  • Publication number: 20130290281
    Abstract: The processing load when rewriting portions of compressed data is alleviated. A storage apparatus comprises a storage unit which stores data which is read/written by the host apparatus, a compression/expansion unit which compresses the data using a predetermined algorithm to generate compressed data, and expands the compressed data, and a control unit which controls writing of data to the storage unit, wherein the control unit manages, as compression block units, divided compressed data which is obtained by dividing compressed data compressed by the compression/expansion unit into predetermined units, and padding data.
    Type: Application
    Filed: April 27, 2012
    Publication date: October 31, 2013
    Inventors: Nobuhiro Yokoi, Masanori Takada, Nagamasa Mizushima, Hiroshi Hirayama, Akira Yamamoto
  • Publication number: 20130290299
    Abstract: Content-based navigation of an electronic device includes receiving supplemental content to an electronic book. The supplemental content is created separately from the electronic book. The content-based navigation also includes associating an identifier of the electronic book with the supplemental content, storing the supplemental content with the identifier in a storage device, and creating an index to the supplemental content that is searchable by the identifier of the electronic book. The content-based navigation further includes providing end user devices with access to the supplemental content in the storage device via the index.
    Type: Application
    Filed: April 25, 2012
    Publication date: October 31, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Guillaume Hoareau, Althea Hookens, John Musial, Sandeep R. Patil
  • Publication number: 20130290276
    Abstract: Data reduction in a storage system comprises determining attributes of data for storage in the storage system and determining expected data reduction effectiveness for the data based on said attributes. Said effectiveness indicates the benefit that data reduction is expected to provide for the data based on said attributes. The data reduction further comprises applying data reduction to the data based on the expected data reduction effectiveness and performance impact, to improve resource usage efficiency.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Applicant: International Business Machines Corporation
    Inventors: David D. Chambliss, Mihail C. Constantinescu, Joseph S. Glider, Maohua Lu
  • Publication number: 20130290275
    Abstract: Apparatus, methods, and other embodiments associated with object synthesis are described. One example apparatus includes logic for identifying a block in a data de-duplication repository and for identifying a reference to the block. The apparatus also includes logic for representing a source object using a first named, organized collection of references to blocks in the data de-duplication repository and logic for representing a target object using a second named, organized collection of references. The apparatus is configured to synthesize the target object from the source object. Since synthesis may be complicated by edge cases, the apparatus is configured to account for conditions including a block in the target object needing less than all the data in a source object block, data to be used to synthesize the target object residing in a sparse hole in a data stream, and the target object needing data not present in the source object.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Applicant: Quantum Corporation
    Inventors: Timothy STOAKES, Andrew Leppard
  • Publication number: 20130290301
    Abstract: Techniques for indexing file paths of items in a content repository may include querying, by at least one processor, a content repository stored on at least one computer readable storage medium for one or more items that qualify for file path indexes, do not have the file path indexes, and have a parent folder that has a file path index, wherein the querying does not depend on results from previous queries, and wherein the file path index indicates an associated item's location in a folder tree, creating, by the at least one processor, the file path indexes for resulting items from the querying, and, if the querying results in at least one resulting item, repeating the querying of the content repository and the creating of the file path indexes until the querying results in zero resulting items.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: David B. Victor
  • Publication number: 20130282671
    Abstract: Various embodiments for preserving data redundancy in a data deduplication system in a computing environment are provided. At least one virtual device out of a volume set is designated as not subject to a deduplication operation.
    Type: Application
    Filed: April 23, 2012
    Publication date: October 24, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rahul M. FISKE, Carl Evan JONES, Subhojit ROY
  • Publication number: 20130282670
    Abstract: Various embodiments for preserving data redundancy of identical data in a data deduplication system in a computing environment are provided. A selected range of virtual addresses of a virtual storage device in the computing environment is designated as not subject to a deduplication operation. Other system and computer program product embodiments are disclosed and provide related advantages.
    Type: Application
    Filed: April 23, 2012
    Publication date: October 24, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rahul M. FISKE, Carl Evan JONES, Subhojit ROY
  • Publication number: 20130282669
    Abstract: Various embodiments for preserving data redundancy in a data deduplication system in a computing environment are provided. An indicator is configured. The indicator is provided with a selected data segment to be written through the data deduplication system to designate that the selected data segment must not be subject to a deduplication operation, such that repetitive data can be written stored on physical locations despite being identical.
    Type: Application
    Filed: April 23, 2012
    Publication date: October 24, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rahul M. FISKE, Carl Evan JONES, Subhojit ROY
  • Publication number: 20130282672
    Abstract: The present invention not only reduces the load but also enhances the accuracy of de-duplication in a storage apparatus which performs in-line de-duplication processing and post-process de-duplication processing. A storage apparatus comprises a storage device and a controller. The controller receives multiple files, and by performing in-line de-duplication processing under a prescribed condition, detects from among the multiple files a file which is duplicated with a file received in the past, stores in the temporary storage area a file other than the detected file of the multiple files, and partitions the stored file into multiple chunks, and by performing post-process de-duplication processing, detects from among the multiple chunks a chunk which is duplicated with a chunk received in the past, and stores in the transfer-destination storage area a chunk other than the detected chunk of the multiple chunks.
    Type: Application
    Filed: April 18, 2012
    Publication date: October 24, 2013
    Applicants: HITACHI COMPUTER PERIPHERALS CO., LTD., HITACHI, LTD.
    Inventors: Naomitsu Tashiro, Mikito Ogata
  • Publication number: 20130282493
    Abstract: Embodiments are directed towards collecting, aggregating and indexing unique and non-unique user data from a plurality of users. The result for a query of this indexed aggregation of user data is provided in a plurality of sub-sets of aggregated user data. Each subset of aggregated user data corresponds to a particular portion of the plurality of users. Also, each of these particular portions of the users is set at least large enough to provide general anonymity for the individual users. User data may be collected by one or more user data suppliers and provided to a user data aggregator. In some embodiments, user data may be collected as unique user data, non-unique user data, or any combination thereof. In some embodiments, user data may be aggregated by zip code, expanded zip code, and/or one or more attributes.
    Type: Application
    Filed: April 24, 2012
    Publication date: October 24, 2013
    Applicant: BLUE KAI, INC.
    Inventors: Lucian Vlad Lita, Omar Tawakol
  • Publication number: 20130275397
    Abstract: Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information.
    Type: Application
    Filed: April 16, 2012
    Publication date: October 17, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan AMIT, Lilia DEMIDOV, Nir HALOWANI
  • Publication number: 20130275392
    Abstract: Computer program products and systems, determine solutions to a problem experienced by a data processing system user. A query is received from the user. The query includes a problem description of the problem experienced by the user with respect to the data processing system. One or more keywords are extracted from the received problem description. An index of problems and associated solutions is searched using the one or more extracted keywords. The index of problems and associated solutions is created by analyzing a document collection describing problems and associated solutions with a text analytics application. One or more documents are returned that contains words or phrases that are similar to the keywords used for searching the index of problems and associated solutions. The documents relevant for the problem and associated solutions are presented to the user.
    Type: Application
    Filed: April 12, 2012
    Publication date: October 17, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dhruv A. Bhatt, Kristin E. McNeil, Nitaben A. Patel
  • Publication number: 20130275429
    Abstract: A system for enabling contextual recommendations and collaboration recommendations, based on a user's current work, comprising a plurality of content collector software applications adapted to interface with a plurality of content management applications, an indexing engine software application, an expanded social network graph database, and a predictive content intelligence software application.
    Type: Application
    Filed: July 17, 2012
    Publication date: October 17, 2013
    Inventors: Graham York, Lee Henry Burgess
  • Publication number: 20130275398
    Abstract: Systems and methods method enabling file actions to be performed on a folder structure in a cloud-based service are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, for representing the folder structure in a user interface to the cloud-based service as a file and enabling file actions to be performed on file representing the folder structure in the user interface to the cloud-based service. In one embodiment, the folder structure and associated content is stored on a server which provides the cloud-based service in a compressed file format which is able to preserve the metadata associated with the folder structure which indicates its representation as the file in the user interface.
    Type: Application
    Filed: September 14, 2012
    Publication date: October 17, 2013
    Applicant: Box, Inc.
    Inventors: Griffin Dorman, Satish Asok, Matthew Self
  • Publication number: 20130275396
    Abstract: Storage systems and methods to improve space saving from data compression by providing a plurality of compression processes, and optionally, one or more parameters for controlling operation of the compression processes and selecting from the plurality of compression processes and the parameters to satisfy resource limits, such as CPU usage and memory usage. In one embodiment, the methods takes into account the content-type, such as text file or video file, and select the compression process and parameters that provide the greatest space savings for that content type while also remaining within a defined resource-usage limit.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 17, 2013
    Applicant: NetApp, Inc.
    Inventors: Michael N. Condict, Fei Xie, Sandip Shete
  • Publication number: 20130275434
    Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 17, 2013
    Applicant: Microsoft Corporation
    Inventors: John C. Platt, Surajit Chaudhuri, Lev Novik, Henricus Johannes Maria Meijer
  • Patent number: 8559724
    Abstract: An apparatus and method for generating additional information about moving picture content, including: comparing image feature information about each image frame in moving picture content with image feature information about each image frame in web information, searching for an image frame in the moving picture content, the image frame matching the image frame in the web information, determining location information about the found image frame in the moving picture content, and generating additional information by use of the determined location information and the web information.
    Type: Grant
    Filed: February 24, 2010
    Date of Patent: October 15, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yoon-hee Choi, Il-hwan Choi, Hee-seon Park
  • Publication number: 20130268501
    Abstract: A computer-based monitoring system and monitoring method implemented in computer software for detecting, estimating, and reporting the condition states, their changes, and anomalies for many assets. The assets are of same type, are operated over a period of time, and outfitted with data collection systems. The proposed monitoring method accounts for variability of working conditions for each asset by using regression model that characterizes asset performance. The assets are of the same type but not identical. The proposed monitoring method accounts for asset-to-asset variability; it also accounts for drifts and trends in the asset condition and data. The proposed monitoring system can perform distributed processing of massive amounts of historical data without discarding any useful information where moving all the asset data into one central computing system might be infeasible. The overall processing is includes distributed preprocessing data records from each asset to produce compressed data.
    Type: Application
    Filed: April 9, 2012
    Publication date: October 10, 2013
    Applicant: MITEK ANALYTICS LLC
    Inventor: Dimitry Gorinevsky
  • Publication number: 20130268497
    Abstract: Exemplary embodiments for increased in-line deduplication efficiency in a computing environment are provided. In one embodiment, by way of example only, hash values are calculated in nth iterations on data samples from fixed size data chunks extracted from an object requested for in-line deduplication. For each of the nth iterations, the calculated hash values for the data samples from the fixed size data chunks are matched in an nth hash index table with a corresponding hash value of existing objects in storage. The nth hash index table is exited upon detecting a mismatch during the matching. The mismatch is determined to be a unique object and is stored. A hash value for the object is calculated. A master hash index table is updated with the calculated hash value for the object and the calculated hash values for the unique object.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Duane Mark BALDWIN, Nilesh P. BHOSALE, John Thomas OLSON, Sandeep Ramesh PATIL
  • Publication number: 20130268496
    Abstract: Exemplary method, system, and computer program product embodiments for increased in-line deduplication efficiency in a computing environment are provided. In one embodiment, by way of example only hash values are calculated in nth iterations for accumulative data chunks extracted from an object requested for in-line deduplication. For each of the nth iterations, the calculated hash values for the accumulative data chunks are matched in a nth hash index table with a corresponding hash value of existing objects in storage. The nth hash index table is exited upon detecting a mismatch during the matching. The mismatch is determined to be a unique object and is stored. A hash value for the object is calculated. A master hash index table is updated with the calculated hash value for the object and the calculated hash values for the unique object. Additional system and computer program product embodiments are disclosed and provide related advantages.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Duane Mark BALDWIN, Nilesh P. BHOSALE, John Thomas OLSON, Sandeep Ramesh PATIL
  • Publication number: 20130268498
    Abstract: A reference counter corresponding to a base chunk of a plurality of chunks of a deduplicated data object is maintained, where the reference counter is incremented in response to an insertion of any chunk that references the base chunk, and where the reference counter is decremented, in response to a deletion of any chunk that references the base chunk. A queue is defined for processing dereferenced chunks of the plurality of chunks. The dereferenced chunks in the queue are processed in a predefined order, to free storage space.
    Type: Application
    Filed: April 6, 2012
    Publication date: October 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael G. Sisco, Yu Meng Li
  • Publication number: 20130262430
    Abstract: Architecture that computes a dominant image from one or more images on a webpage. A dominant image classifier scans webpages in an offline-created index to identify the prominent images in the webpages. In a more specific implementation the image selected is the image associated with a name query. Face detection technology can be utilized to identify which of the images on a given webpage contain faces. A query classifier identifies queries that contain people names. In the context of search engines and search result pages, the web results for name queries can further include prominent people face images as thumbnail images. Additional facts (structured data) can further be included that together with the results elements of caption title, snippet and attribute (uniform resource locator (URL)) provide an improved summary of the person on the page.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Applicant: Microsoft Corporation
    Inventors: Krishnan Thazhathekalam, David D. Ahn, Andrea Burbank, Taroon Mandhana, David Simpson, Yi-An Lin
  • Publication number: 20130262407
    Abstract: For multiplexer classification for column compression of tabular data, similar type data segments are classified into classes for grouping the data segments into compression streams associated with each one of the classes. The compression streams are encoded based on a class-specific optimized encoding operation. The compression streams into one output buffer, wherein the compression streams are extracted.
    Type: Application
    Filed: March 27, 2012
    Publication date: October 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan AMIT, Lilia DEMIDOV, Nir HALOWANI, Sergey Marenkov
  • Publication number: 20130262438
    Abstract: A conversation server system having one or more processors and memory stores a plurality of index components in an index, a respective index entry corresponding to a respective term and having a plurality of index components, a respective index component of the respective index entry identifying a message that is associated with the respective term. The server receives a first message, associates the first message with a conversation having at least one other message and stores, in the index, a plurality of first-message index components that each include an identifier of the first message. The first-message index components include one or more index components indicative of a plurality of message terms in the first message and one or more index components indicative of one or more conversation terms in the conversation, the one or more conversation terms comprising one or more terms that are not in the first message.
    Type: Application
    Filed: August 29, 2011
    Publication date: October 3, 2013
    Inventor: Andrew J. Palay
  • Publication number: 20130262409
    Abstract: For column compression of tabular data, similar type data segments are classified into classes for grouping the data segments into compression streams associated with each one of the classes. The compression streams are encoded based on a class-specific optimized encoding operation. The compression streams into one output buffer, wherein the compression streams are extracted.
    Type: Application
    Filed: June 29, 2012
    Publication date: October 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan AMIT, Lilia DEMIDOV, Nir HALOWANI, Sergey MARENKOV
  • Publication number: 20130262404
    Abstract: A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Applicant: NETAPP, INC.
    Inventors: Vinod Kumar Daga, Craig Anthony Johnston, Ling Zheng
  • Publication number: 20130262408
    Abstract: One or more transformation functions can be used in connection or together with one or more compression/decompression techniques. A transformation function can transform data (e.g., a data object) into a form more suitable for compression and/or decompression. As a result, data can be compressed and/or decompressed more effectively. In addition, multiple data objects can be associated with various transformation functions and/or compression/decompression techniques. As a result, different approaches can be taken with respect to compression and decompression of data objects in an effort to find an optimum approach for compression of data objects that may vary significantly from each other and change over time. It will be appreciated that the objects can be associated with transformation functions in a dynamic manner to accommodate changes to data. Also, an extendible and/or extensible system can allow for growth and adaption of new data in forms not currently present or expected.
    Type: Application
    Filed: May 23, 2012
    Publication date: October 3, 2013
    Inventors: David Simmen, Shant Hovsepian, Jeffrey Davis
  • Publication number: 20130246435
    Abstract: A knowledge extraction framework may iteratively enrich an ontology that is used to classify structured knowledge obtained from web pages based on structured knowledge previously acquired from other web pages. The framework may enable a user to define the ontology for extracting structured knowledge from a plurality of web pages. The framework applies the ontology using a supervised extraction algorithm to extract seed information from a set of web pages. The framework further applies an unsupervised extraction algorithm to extract the structured knowledge from an additional set of web pages. The framework subsequently maps the structured knowledge to the ontology based on the seed information to enrich the ontology.
    Type: Application
    Filed: March 14, 2012
    Publication date: September 19, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Jun Yan, Lei Ji, Edward W. Wild, Yi Li, Ning Liu, Zheng Chen
  • Publication number: 20130246384
    Abstract: Provided are computer program product, system, and method for providing access to documents in an online document sharing community in a network environment including a plurality of participant computers operated by participants in the online document sharing community and a storage system. Document content is processed to add search terms for the document and a document identifier to a search index accessible through a search engine over the network to participants not under an obligation of confidentiality to the owner with respect to the document. Access is provided to the content of the document to the participants in the online document sharing community. A determination is made of a publication time the document was included in the search index and made accessible to the participant computers operated by participants not under the obligation of confidentiality to the owner of the document content.
    Type: Application
    Filed: March 19, 2012
    Publication date: September 19, 2013
    Inventor: David W. VICTOR
  • Publication number: 20130246436
    Abstract: A system and method for parsing a machine-readable document having associated drawing figures with components labeled by references, to identify the occurrence of the references for generating a dynamic reference index table, and for either automatically annotating the references in the associated drawing figures with descriptive words or phrases cross-referenced to the references within the generated dynamic reference index table, or generating a reference usage report identifying inconsistencies and/or errors within the document associated with the identified reference occurrences.
    Type: Application
    Filed: March 19, 2012
    Publication date: September 19, 2013
    Inventor: Russell E. Levine
  • Publication number: 20130246375
    Abstract: The present invention relates to a method and system for facilitating access to recorded data. The system comprises an interface and a processing device. The interface is arranged to receive data and the processing device is arranged to separate the received data in data subsets, compress each data subset and assign an identifier to each compressed data subset, thereby creating data units each comprising a compressed data subset and an associated identifier, the processing device further being arranged to establish an index on the basis of the assigned identifiers.
    Type: Application
    Filed: March 14, 2012
    Publication date: September 19, 2013
    Inventors: Max Roy PRAKOSO, Andi R. Hakim, Robert Lang
  • Patent number: 8538969
    Abstract: A data format is optimized for storing data such as website traffic data. The data format enables easy access to and filtering of data, for example in generating website traffic reports. The data format also provides significant data compression. A method for generating a data file according to the data format employs linear compression and indexing to efficiently store the data. Data stored according to the format can be easily retrieved, particularly when a known value is specified and particular entries matching the known value are sought.
    Type: Grant
    Filed: November 14, 2005
    Date of Patent: September 17, 2013
    Assignee: Adobe Systems Incorporated
    Inventor: Michael Paul Bailey
  • Publication number: 20130238627
    Abstract: Methods, systems, and computer-storage media having computer-usable instructions embodied thereon, for integrating searches are provided. An entity index may be compiled that includes entity files for a plurality of identified entities such that any information known about a single entity is contained in a single entity file and is easily accessible. Web indexes, including web page information, may be referenced in order to associate web pages with entities, or entity files. Once identified as related to an entity, a web page may be associated with an entity identifier that is associated with the related entity such that a search query for the identified entity results in both entity information for the entity and web pages associated with the entity.
    Type: Application
    Filed: March 6, 2012
    Publication date: September 12, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: RICHARD QIAN, ANDREW SHUMAN, DERRICK CONNELL, ROBERT FIRBY, STEVEN MACBETH, TAROON MANDHANA
  • Publication number: 20130238629
    Abstract: A programmed hardware network configuration file repository indexer is configured with a network-configuration-specific index-operation rule set. In another example, a network-configuration-specific index-operation rule set can be used in generating an index to a network configuration file repository. In the latter example, the index and the index-operation rule set is used in searching the network configuration file repository.
    Type: Application
    Filed: March 8, 2012
    Publication date: September 12, 2013
    Inventors: Ram Kumar Kosuri, Swamy Jagannadha Mandavilli, Murali Mohan Dingari
  • Publication number: 20130238568
    Abstract: Various embodiments for processing data in a data deduplication system are provided. For data segments previously deduplicated by the data deduplication system, a supplemental hot-read link is established for those of the data segments determined to be read on at least one of a frequent and recently used basis. Other system and computer program product embodiments are disclosed and provide related advantages.
    Type: Application
    Filed: March 6, 2012
    Publication date: September 12, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Allen Keith BATES, Louie Arthur DICKENS, Stephen Leonard SCHWARTZ, Daniel James WINARSKI
  • Publication number: 20130232124
    Abstract: A storage node receives a file. The storage node determines whether the file is stored on the storage node by comparing a hash value computed for content of the received file to hash values for content stored on the storage node. The storage node transfers a name and address of the file to a directory node.
    Type: Application
    Filed: March 5, 2012
    Publication date: September 5, 2013
    Inventor: Blaine D. GAITHER
  • Publication number: 20130226930
    Abstract: A method, medium, and apparatus are disclosed for indexing multimedia content by a computer. The method comprises segmenting the multimedia content into a plurality of segments. For each segment, the method identifies one or more features present in the segment, wherein the features are of respective media types. The method then identifies, for each identified feature in each segment, one or more respective keywords associated the identified feature. Then, the method determines, for each identified keyword associated with an identified feature in a given segment, a respective relevance of the keyword to the given segment. The respective relevance is dependent on a weight associated with the respective media type of the identified feature.
    Type: Application
    Filed: February 29, 2012
    Publication date: August 29, 2013
    Applicant: Telefonaktiebolaget L M Ericsson (publ)
    Inventors: Tommy ARNGREN, Joakim Söderberg, Marika Stålnacke
  • Publication number: 20130218896
    Abstract: A conversation server system having one or more processors and memory stores a plurality of index components in an index. The server receives a first message, associates the first message with a conversation having one or more other messages and identifies quoted text in the message based on text that occurs in one or more of the other messages. The server stores, in the index, a plurality of first-message index components including one or more index components that correspond to terms in original text of the first message and one or more index components that correspond to terms that occur in the quoted text, where the first-message index components for original text of the first message are distinguished from the first-message index components for quoted text of the first message in the index.
    Type: Application
    Filed: August 29, 2011
    Publication date: August 22, 2013
    Inventor: Andrew J. Palay
  • Publication number: 20130218847
    Abstract: Provided is a file server apparatus 4 that processes files stored in a plurality in response to an I/O request when entity data of a plurality of files has a common portion, generates a consolidation file that holds common entity data as consolidated data; and manages each of the plurality of files as a de-duplication file that does not hold the consolidated data, and, when there is the I/O request to at least one of the plurality of files, acquires the consolidated data and processes in response to the I/O request to at least one of the plurality of files, and holds difference data generated by performing processing in response to the I/O request.
    Type: Application
    Filed: February 16, 2012
    Publication date: August 22, 2013
    Inventor: Nobuyuki Saika
  • Publication number: 20130218897
    Abstract: A conversation server system having one or more processors and memory stores a plurality of index components in an index. The server associates a first message having a first term with a conversation that includes at least a second message. The first term is not included in the second message and the second message includes a second term that is not included in the first message. The server stores, in the index, a plurality of index components for a same referenced object, including an index component indicative of the first term and an index component indicative of the second term. In some embodiments the same referenced object is associated with index components for a first sender of the first message and a second sender of the second message, so that a search for a conversation with messages from the first sender and the second sender retrieves the referenced object.
    Type: Application
    Filed: August 29, 2011
    Publication date: August 22, 2013
    Inventor: Andrew J. Palay
  • Publication number: 20130218898
    Abstract: Metadata search is enhanced by utilizing relationship data indicating relationships between metadata items. A server generates an index mapping metadata items to terms associated with the metadata items and a graph describing relationships between each of the metadata items. When the server receives a search request, the server locates a candidate set of the metadata items based on the search term(s) and the index. The server performs a link analysis of the graph to determine a relationship score for each metadata item. For each particular metadata item in the candidate set of the metadata items, the server calculates a ranking score based at least on the relationship score for the particular metadata item. The server generates a ranked result set based on comparing the ranking scores for the candidate set of metadata items. The server then provides information indicating the ranked result set in response to the search request.
    Type: Application
    Filed: February 16, 2012
    Publication date: August 22, 2013
    Applicant: Oracle International Corporation
    Inventors: Nikhil Raghavan, Ravi Murthy, Aman Naimat
  • Patent number: 8515964
    Abstract: Method, system, and programs for computing similarity. Input data is first received from one or more data sources and then analyzed to obtain an input feature vector that characterizes the input data. An index is then generated based on the input feature vector and is used to archive the input data, where the value of the index is computed based on an improved Johnson-Lindenstrass transformation (FJLT) process. With the improved FJLT process, first, the sign of each feature in the input feature vector is randomly flipped to obtain a flipped vector. A Hadamard transformation is then applied to the flipped vector to obtain a transformed vector. An inner product between the transformed vector and a sparse vector is then computed to obtain a base vector, based on which the value of the index is determined.
    Type: Grant
    Filed: July 25, 2011
    Date of Patent: August 20, 2013
    Assignee: Yahoo! Inc.
    Inventors: Shanmugasundaram Ravikumar, Anirban Dasgupta, Tamas Sarlos
  • Publication number: 20130204853
    Abstract: A computer-implemented method for use in maintaining currency of a projection index of a plurality of database objects. The computer-implemented method includes creating the projection index representative of a connection between a first database object and at least a second database object, determining an entity dependency between the first database object and at least the second database object, determining a path dependency between the first database object and at least the second database object, and updating the projection index in response to a modification of one or both of the entity dependency and the path dependency.
    Type: Application
    Filed: February 7, 2012
    Publication date: August 8, 2013
    Applicant: DASSAULT SYSTEMES ENOVIA CORPORATION
    Inventors: David Edward Tewksbary, Clark David Milliken