Описание тега filehash

The filehash package implements a simple key-value style database where character string keys are associated with data values that are stored on the disk. A simple interface is provided for inserting, retrieving, and deleting data from the database. Utilities are provided that allow filehash databases to be treated much like environments and lists are already used in R. These utilities permit interactive and exploratory analysis on large datasets.

Working with large datasets in R can be cumbersome because of the need to keep objects in physical memory. While many might generally see that as a feature of the system, the need to keep whole objects in memory creates challenges to those who might want to work interactively with large datasets. Here we take a simple definition of “large dataset” to be any dataset that cannot be loaded into R as a single R object because of memory limitations. For example, a very large data frame might be too large for all of the columns and rows to be loaded at once. In such a situation, one might load only a subset of the rows or columns, if that is possible.

The filehash package provides a full read-write implementation of a key-value database for R. The package does not depend on any external packages (beyond those provided in a standard R installation) or software systems and is written entirely in R, making it readily usable on most platforms. The filehash package represents a database as an instance of an S4 class and operates directly on the S4 object via various methods.

Text adapted from: Peng, Roger, "INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R" (June 2006). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 108. http://biostats.bepress.com/jhubiostat/paper108 & http://cran.r-project.org/web/packages/filehash/vignettes/filehash.pdf