Schema Extraction – Divesh Srivastava – Thursday, October 22

Thursday, October 22 – 11:00-12:30

Meeting room, first floor

Via della Vasca Navale, 79, Roma


Abstract: Increasingly complex databases need ever more sophisticated tools to help users understand their schemas and interact with the data.  This is challenging since complex databases often have thousands of tables and inadequate schemas, with little indication of the important tables or the main concepts.  We address these challenges and describe techniques to extract an understandable schema from a complex database. We first present a robust algorithm to discover foreign/primary key relationships between tables, based on a general rule, termed Randomness.  We then describe an information-theoretic approach that takes a set of tables linked using foreign/primary keys to identify important tables and cluster tables into the main concepts of the schema.  Finally, we propose summary graphs that meet specified size constraints and preserve the most informative join paths between tables of user interest.
Bio: Divesh Srivastava is the head of Database Research at AT&T Labs-Research.  He is an ACM fellow, on the board of trustees of the VLDB Endowment, the managing editor of the Proceedings of the VLDB Endowment (PVLDB) and an associate editor of the ACM Transactions on Database Systems (TODS).  His research interests and publications span a variety of topics in data management.