MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development.
May 9th 2013: MADlib v0.7 is out! Binary packages are available for CentOS/RedHat and for Mac OS X. On other platforms, MADlib can be built from source. Our Wiki provides detailed instructions for deploying MADlib on PostgreSQL and Greenplum installations. For a list of new features, bug fixes, and known issues, please refer to the Release Notes. As always, the MADlib forum is open for questions and discussions. Try it out and let us know about your feedback!
April 9th 2012: MADlib v0.6 is out!
November 19th 2012: MADlib v0.5 is out!
August 29th 2012: Chris Ré, Florian Schoppmann, and Caleb Welton present the paper The MADlib Analytics Library or MAD Skills, the SQL at the 38th International Conference on Very Large Data Bases (VLDB 2012) in Istanbul, Turkey. Slides from the talk are available here.
July 11th 2012: Xixuan (Aaron) Feng presents a poster about MADlib at the Workshop on Algorithms for Modern Massive Data Sets (MMDS 2012) at Stanford.
Jun 18th 2012: MADlib v0.4 is out!
June 17th 2012: Gavin Yang talks about MADlib at the PostgreSQL Conference China 2012 (in Chinese). Some pictures are available in the organizer's blog.
May 22nd 2012: Our technical report about MADlib's architecture and design patterns has been accepted to the industrial track of VLDB 2012. Meet us in person in Istanbul!
May 17th 2012: Hitoshi Harada talks about MADlib at the PostgreSQL Conference.
Apr 6th 2012: Joe Hellerstein writes about MADlib for the ACM SIGMOD blog.
MADlib grew out of discussions between database-engine developers, data scientists, IT architects and academics, who were interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper in VLDB 2009 that coined the term "MAD Skills" for data analysis. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and data scientists at EMC/Greenplum.
Binary packages of the latest MADlib release (v0.7):
• Mac OS X 10.6 and higher:
Greenplum 4.1, 4.2 / PostgreSQL 9.0, 9.1, 9.2 (64-bit)
• CentOS / Red Hat 5 and higher (64-bit):
Greenplum 4.1, 4.2 / PostgreSQL 9.0, 9.1, 9.2
Source Code:
• Snapshot of development repository:
.zip
.tar.gz
• Latest stable release (v0.7):
.zip
.tar.gz
Installation guides can be found in the
MADlib Wiki.
Documentation for the latest release (v0.7):
• Users: http://doc.madlib.net
• Developers: http://devdoc.madlib.net
Pre-release documentation generated out of development repository:
• Users: http://doc.madlib.net/master
• Developers: http://devdoc.madlib.net/master
• MADlib Wiki
• Project Roadmap
• Contribution Guide
• User mailing list:
List Information and Subscriptions, Browse Recent Posts
• Developer mailing list:
List Information and Subscriptions, Browse Recent Posts
• Bug reporting and feature requests: