News

Home

T he Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.

Questo insegnamento mira a

Competenze

Data Bases Design
NoSQL Databases
Big Data
Database as (Micro) Service
Data Analytics
Docker containers
Vagrant
Git
Sbt, Maven
Scala Lang

Schedule

Aula 1A - first floor at Department of Computer Science Uniba

- Tuesday (Martedì) from 9:00 to 12:00

- Thursday (Giovedì) from 15:00 to 17:00 (Bring your laptop)

Instructor

Fabio Fumarola
Instructor
Dottore di Ricerca in Informatica. Esperto in Web Mining e Big Data; Analista, Progettista e Sviluppatore Software. Esperto in NoSQL.
Dottore di Ricerca in Informatica. Esperto in Web Mining e Big Data; Analista, Progettista e Sviluppatore Software. Esperto in NoSQL.
Personal Information
  • Email: fabio.fumarola@gmail.com, fabio.fumarola@uniba.it
  • Phone: 080 544 32 69
  • Address: Via E. Orabona 4, Department of Informatics, 5 floor room 509

Technical Skills
  • Operating systems: Linux (mostly Ubuntu), Mac Os X
  • Main programming languages: Scala, Java
  • Other programming, scripting and query languages: SQL, Ruby, HTML/XML, CSS, Node.js, R, Python
  • Frameworks and libraries: Akka.io, JUnit, Apache Hadoop, Apache Spark, Play2, AngularJS,...
  • Other software: Maven3, Git, SBT, Gradle, Perforce, Hansoft, Apache Tomcat, Eclipse, Netty, ...
  • Other Tools: NLTK, Weka, Moa, HBase, TitanDB, Docker, Vagrantx
Repositories

Syllabus

1. Course Introduction (first week)

Topics: evolution of enterprise computing, from business to decision support (`60, `80, `90, `2000), scaling up databases, data variety, connectivity, P2P knowledge, concurrency, cloud, RDBMS issues, NoSQL databases intro, impedance mismatch, attack of the cluster


Slides Download

2. Linux Containers and Docker (second week)

Topics: The Evolution of IT, The Solutions: Virtual Machines vs Vagrant vs Docker, Differences, Examples: Vagrant, Boot2Docker, Docker, Docker Hub, Orchestrate Docker, Mesosphere e CoreOS


Slides Download

3. An Introduction to Git (second week)

Topics: What is Version Control? (and why use it?), What is Git? (And why Git?), How git works Create a repository, Branches, Add remote, How data is stored


Slides Download

4. How to manage dependencies (third week)

Topics: How to create a java project without an IDE, How do to manage dependencies on a standard way, How to execute task to build a project


Slides Download

5. NoSQL based Data Models

Topics: Data Model Evolution, Relational Model vs Aggregate Model, Consequences of Aggregate Models ,Aggregates and Transactions, Aggregates Models on NoSQL, Key-value and Document, Column-Family Stores, Summarizing Aggregate-Oriented databases


Slides Download
Domain-Driven Design

6. More on NoSQL based Data Models

Topics: How to deal with relationships – Graph Databases, Materialized Views, Modeling for Data Access, Distribu0on Models (Single server, Sharding, Master-Slave, Peer-to-Peer)


Slides Download

7. Key-Value Data Store and Case Study

Topics:Key-values introduction,Major Key-Value Databases, Dynamo DB: How is implemented, Background, Partitioning: Consistent Hashing, High Availability for writes: Vector Clocks, Handling temporary failures: Sloppy Quorum, Recovering from failures: Merkle Trees, Membership and failure detection: Gossip Protocol


Slides Download

8. Column-Oriented Data Store and Case Study

Topics:bigtable, cassandra, column-oriented, design nosql databases, hbase, hypertable, immutability, nosql, SSTable, tablet server.


Slides Download

9. Document-Oriented Database in depth

Topics:Introduction, What is a Document, DocumentDBs, MongoDB, Data Model, Indexes, CRUD, Scaling, Pros and Cons.


Slides Download

10. Graph-Oriented Database

Topics: Introduction, The Lack of relationship for RDBMS and NoSQL, Graph Databases: Features, Relations, Query Language, Data Modeling with Graphs and Conclusions


Slides Download

11. From Hadoop to Spark 1/2

Topics:Aggregate and Cluster, Scatter Gather and MapReduce, MapReduce , Why Spark?, Spark (Example, task and stages), Docker Example, Scala and Anonymous Functions, Next Topics in 2/2


Slides Download

Introduction to AngularJS + demo

General introduction to Single Page Applications using AngularJS with a final demo. Thanks to Nicola Sanitate and Francesco Abbattista


Slides Download

11. From Hadoop to Spark 2/2

Topics:spark-shell, pyspark, HDFS, how to copy file to HDFS, spark transformations, spark actions, Spark SQL (Shark), spark streaming, streaming transformation stateless vs stateful, sliding windows, examples


Slides Download