|
| | Projects |
|
SW-Store
The goal of the Semantic Web vision is to free Web data from the applications that control them, so that data can be easily described and exchanged. This is accomplished by supplementing natural language and other data found on the Web with machine readable metadata in statement form (e.g., X is-a person, X has-name ``Joe'', X has-age ``35'') and enabling descriptions of data ontologies so that data from different applications can be integrated through ontology mapping. One ultimate goal is to turn the Web into a giant database, against which one could issue structured queries and receive structured answers in response.
SW-Store is a recently launched project whose goal is to manage and query Semantic Web data. We are starting from a clean-slate and designing a DBMS specifically for this type of data and the prevalent Semantic Web data model, the Resource Description Framework, or RDF. We explore how common SW queries and applications such as reasoning and biological data integration can be built into the database. This work builds on a recent publication that won "Best Paper" at VLDB in September.
|
|
| |
|
C-Store
As companies increasingly use analytic data marts and data warehouses for their customer relationship management and business intelligence applications, the use of column-oriented DBMS technology is growing. Column-oriented databases store DBMS tables column-by-column (instead of row-by-row) and tend to perform better on analytical applications since these applications tend to only focus on a subset of table attributes at a time, and are thus more I/O efficient. Examples of these types of analytic applications are:
- An application that evaluates the best offer to give a customer while they are on the phone with a call center
- An application that looks for correlations in products that customers buy in the same transaction
- An application that looks at customers' history to evaluate credit risk.
Due to the increasing popularity of column-stores, a number of recent venture capital backed start-up companies have formed in recent years that are built on this technology, including Vertica, ParAccel, and Calpont in addition to the increasing popularity of column-oriented databases that have been around a little longer (such as Sybase IQ, Sand/DNA Analytics, and SenSage). Column-stores have also recently seen great momentum in the research community with a number of recent publications. The C-Store project has built an academic prototype of a column-oriented database and this prototype has lead to a great deal of important research exploring the architectural design differences between row-oriented databases and column-oriented database. There are still many important questions remaining unanswered; however, early performance results are very encouraging, with data warehouse queries consistently running one to two orders of magnitude faster. This project is collaboration between Yale, MIT, Brown, Brandeis, and UMass Boston.
|
|
| |
|
H-Store: A High-Performance OLTP Database
Current OLTP database designs, which date largely from the 1970s, are based on several assumptions about the architecture of database
applications and hardware that are less true today than they were 30 years ago. For example, all but the very largest OLTP applications
can fit in main memory of a modern shared-nothing cluster of server machines. On a single node with a memory resident database, OLTP
transactions take only a few microseconds to execute. Additionally, many applications carefully construct database transactions so they
have no user stalls. Taken together, both of these points mean there is a large class of OLTP applications for which a single-threaded
execution engine with no concurrency control performs very well, avoiding the need for high overhead, locking-based pessimistic
concurrency control protocols designed to keep CPUs busy during disk and user stalls. Further, the cost of computers has dropped so
dramatically in the past thirty years that paying for a dedicated database administrator has become one of the dominant costs in running a
database system, such that tools that automate design and tuning have great value. Finally, the architecture of a server node has also
shifted -- the number of cores available to process data is proliferating. The goal of the H-Store project is to investigate how these
architectural and application shifts affect the performance of OLTP databases, and to study what performance benefits would be possible
with a complete redesign of OLTP systems in light of these trends. Our early results show that a simple prototype built from scratch using
modern assumptions can outperform current commercial DBMS offerings by around a factor of 80 on OLTP workloads. We are currently working
to build a full-featured system that demonstrates these performance wins in a more robust prototype.
|
|
| |
|
Chunky-Store
As shown in the C-Store project, storing data in columns seems to be the best performing solution at the storage layer level in analytical applications. Meanwhile, storing data in rows performs better for transactional applications. Further, scientific database applications prefer to store data in multidimensional arrays. Chunky-Store looks at how multiple different storage layer options can be integrated into a single database system so that data can be stored in a way that is best-suited for the way it is expected to be used. Various hybrid storage layer designs are also being explored. This work is collaboration between MIT and Yale.
|
|
| |
|
NanoDB: A microkernel-based database system
Current large-scale database systems are based on a monolithic
architecture. While they may be designed for extensibility in particular
areas such as indexing methods and user-defined types, changes to other
aspects of the system require vast re-architecting of the system's code
base. The recent proposals of novel database designs -- including C-Store,
H-store, and Chunky-Store -- highlight the need for an easier means to
implement new ideas in database systems.
NanoDB is a new relational database engine consisting of a small kernel plus
a number of modules that implement the main database functionality. The
database engine can be configured to use different modules to support a
range of target applications, each with different requirements. By
separating the database functionality into modules, we allow the user to
configure the database engine in a variety of different ways, including the
selection of different relational input languages, storage mechanisms, query
optimizers, and file storage structures.
NanoDB can be used to construct specific data management systems fine-tuned
for particular applications. NanoDB will also be useful as a testbed for
experimenting with new ideas in data management.
|
|
| |
|
PANACEA: A new integrated data storage management and query system
The management of local storage is a mess: hierarchical file systems are not
sufficient for modern storage needs. Too much effort is involved in storage
and search of files -- file data and metadata is not easily accessible in an
integrated manner. Currently available tools use crawling to mine the system
for data, spending many computational resources to do so, but still do not
support fully structured search.
Likewise, web search is also a mess. Most data found on the web does not
have enough machine readable metadata for deep search, while existing
text-based search methods are clearly inadequate. Furthermore, there is no
easy way to integrate the results of multiple searches across different
domains, nor is there a consistent access control system to provide security
across sites.
The goal of the PANACEA project is to provide a unified system that will
allow us to store, manage, and query different types of data in an
integrated manner. We seek to integrate a wide range of data, including
unstructured, semi-structured (XML), and fully-structured (relational) data;
and data stored on the local machine and across the network.
|
|
| |
| |
|
|
|