Designing Data Bases with NoSql Data Models

News

06/06/2015 - From Hadoop to Spark 2/2
26/05/2015 - From Hadoop to Spark 1/2
29/05/2015 - from 9-11 AngularJS Tutorial
14/05/2015 - published Lab Document-Oriented
11/05/2015 - published lecture on document oriented databases
11/05/2015 - published case studies list

Home

T he Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.

Questo insegnamento mira a

introdurre i diversi modelli di database NoSQL

Key-Value: Riak, Redis, Aereospike, LevelDB
Colum-Oriented: HBase, Cassandra e Hypertable
Document-Oriented: MongoDB, CouchDB, CouchBase, RethinkDB
Graph-Based: Neo4J, TitanDB , OrientDB e Twitter FlockDB
Index-Based: Elasticsearch

sviluppare competenze di progettazione e sviluppo di basi di dati con modelli di nuova generazione basati su NoSQL

Data Model

sviluppare competenze per l’elaborazione di Big Data utilizzando MapReduce and Spark
setup di servizi elastici e scalabili: Docker, Vagrant e Mesos
sviluppare codice in team: Git, SBT e Maven
Bere tanto caffè

Competenze

Data Bases Design

NoSQL Databases

Big Data

Database as (Micro) Service

Data Analytics

Docker containers

Vagrant

Git

Sbt, Maven

Scala Lang

Schedule

Aula 1A - first floor at Department of Computer Science Uniba

- Tuesday (Martedì) from 9:00 to 12:00

- Thursday (Giovedì) from 15:00 to 17:00 (Bring your laptop)

Instructor

Instructor

Dottore di Ricerca in Informatica. Esperto in Web Mining e Big Data; Analista, Progettista e Sviluppatore Software. Esperto in NoSQL.

Personal Information

Email: fabio.fumarola@gmail.com, fabio.fumarola@uniba.it
Phone: 080 544 32 69
Address: Via E. Orabona 4, Department of Informatics, 5 floor room 509

Technical Skills

Operating systems: Linux (mostly Ubuntu), Mac Os X
Main programming languages: Scala, Java
Other programming, scripting and query languages: SQL, Ruby, HTML/XML, CSS, Node.js, R, Python
Frameworks and libraries: Akka.io, JUnit, Apache Hadoop, Apache Spark, Play2, AngularJS,...
Other software: Maven3, Git, SBT, Gradle, Perforce, Hansoft, Apache Tomcat, Eclipse, Netty, ...
Other Tools: NLTK, Weka, Moa, HBase, TitanDB, Docker, Vagrantx

Repositories

Syllabus

1. Course Introduction (first week)

Topics: evolution of enterprise computing, from business to decision support (`60, `80, `90, `2000), scaling up databases, data variety, connectivity, P2P knowledge, concurrency, cloud, RDBMS issues, NoSQL databases intro, impedance mismatch, attack of the cluster

Slides Download

2. Linux Containers and Docker (second week)

Topics: The Evolution of IT, The Solutions: Virtual Machines vs Vagrant vs Docker, Differences, Examples: Vagrant, Boot2Docker, Docker, Docker Hub, Orchestrate Docker, Mesosphere e CoreOS

Slides Download

3. An Introduction to Git (second week)

Topics: What is Version Control? (and why use it?), What is Git? (And why Git?), How git works Create a repository, Branches, Add remote, How data is stored

Slides Download

4. How to manage dependencies (third week)

Topics: How to create a java project without an IDE, How do to manage dependencies on a standard way, How to execute task to build a project

Slides Download

5. NoSQL based Data Models

Topics: Data Model Evolution, Relational Model vs Aggregate Model, Consequences of Aggregate Models ,Aggregates and Transactions, Aggregates Models on NoSQL, Key-value and Document, Column-Family Stores, Summarizing Aggregate-Oriented databases

Slides Download
Domain-Driven Design

6. More on NoSQL based Data Models

Topics: How to deal with relationships – Graph Databases, Materialized Views, Modeling for Data Access, Distribu0on Models (Single server, Sharding, Master-Slave, Peer-to-Peer)

Slides Download

7. Key-Value Data Store and Case Study

Topics:Key-values introduction,Major Key-Value Databases, Dynamo DB: How is implemented, Background, Partitioning: Consistent Hashing, High Availability for writes: Vector Clocks, Handling temporary failures: Sloppy Quorum, Recovering from failures: Merkle Trees, Membership and failure detection: Gossip Protocol

Slides Download

8. Column-Oriented Data Store and Case Study

Topics:bigtable, cassandra, column-oriented, design nosql databases, hbase, hypertable, immutability, nosql, SSTable, tablet server.

Slides Download

9. Document-Oriented Database in depth

Topics:Introduction, What is a Document, DocumentDBs, MongoDB, Data Model, Indexes, CRUD, Scaling, Pros and Cons.

Slides Download

10. Graph-Oriented Database

Topics: Introduction, The Lack of relationship for RDBMS and NoSQL, Graph Databases: Features, Relations, Query Language, Data Modeling with Graphs and Conclusions

Slides Download

11. From Hadoop to Spark 1/2

Topics:Aggregate and Cluster, Scatter Gather and MapReduce, MapReduce , Why Spark?, Spark (Example, task and stages), Docker Example, Scala and Anonymous Functions, Next Topics in 2/2

Slides Download

Introduction to AngularJS + demo

General introduction to Single Page Applications using AngularJS with a final demo. Thanks to Nicola Sanitate and Francesco Abbattista

Slides Download

11. From Hadoop to Spark 2/2

Topics:spark-shell, pyspark, HDFS, how to copy file to HDFS, spark transformations, spark actions, Spark SQL (Shark), spark streaming, streaming transformation stateless vs stateful, sliding windows, examples

Slides Download

Corso al Terzo anno: Laurea triennale in Informatica e Tecnologie per la Produzione del Software (D.M. 270)

Università degli Studi di Bari

News

Home

Questo insegnamento mira a

Competenze

Data Bases Design

NoSQL Databases

Big Data

Database as (Micro) Service

Data Analytics

Docker containers

Vagrant

Git

Sbt, Maven

Scala Lang

Schedule

Aula 1A - first floor at Department of Computer Science Uniba

- Tuesday (Martedì) from 9:00 to 12:00

- Thursday (Giovedì) from 15:00 to 17:00 (Bring your laptop)

Instructor

Syllabus

1. Course Introduction (first week)

2. Linux Containers and Docker (second week)

3. An Introduction to Git (second week)

4. How to manage dependencies (third week)

5. NoSQL based Data Models

6. More on NoSQL based Data Models

7. Key-Value Data Store and Case Study

8. Column-Oriented Data Store and Case Study

9. Document-Oriented Database in depth

10. Graph-Oriented Database

11. From Hadoop to Spark 1/2

Introduction to AngularJS + demo

11. From Hadoop to Spark 2/2

Links

Recommended Books

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence

Big data. Architettura, tecnologie e metodi per l'utilizzo di grandi basi di dati

Suggested Books

Big Data: principles and best practices of scalable realtime data systems

HBase in Action

Hadoop: the Definitive Guide