T he Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Topics: evolution of enterprise computing, from business to decision support (`60, `80, `90, `2000), scaling up databases, data variety, connectivity, P2P knowledge, concurrency, cloud, RDBMS issues, NoSQL databases intro, impedance mismatch, attack of the cluster
Topics: The Evolution of IT, The Solutions: Virtual Machines vs Vagrant vs Docker, Differences, Examples: Vagrant, Boot2Docker, Docker, Docker Hub, Orchestrate Docker, Mesosphere e CoreOS
Topics: What is Version Control? (and why use it?), What is Git? (And why Git?), How git works Create a repository, Branches, Add remote, How data is stored
Topics: How to create a java project without an IDE, How do to manage dependencies on a standard way, How to execute task to build a project
Topics: Data Model Evolution, Relational Model vs Aggregate Model, Consequences of Aggregate Models ,Aggregates and Transactions, Aggregates Models on NoSQL, Key-value and Document, Column-Family Stores, Summarizing Aggregate-Oriented databases
Topics: How to deal with relationships – Graph Databases, Materialized Views, Modeling for Data Access, Distribu0on Models (Single server, Sharding, Master-Slave, Peer-to-Peer)
Topics:Key-values introduction,Major Key-Value Databases, Dynamo DB: How is implemented, Background, Partitioning: Consistent Hashing, High Availability for writes: Vector Clocks, Handling temporary failures: Sloppy Quorum, Recovering from failures: Merkle Trees, Membership and failure detection: Gossip Protocol
Topics:bigtable, cassandra, column-oriented, design nosql databases, hbase, hypertable, immutability, nosql, SSTable, tablet server.
Topics:Introduction, What is a Document, DocumentDBs, MongoDB, Data Model, Indexes, CRUD, Scaling, Pros and Cons.
Topics: Introduction, The Lack of relationship for RDBMS and NoSQL, Graph Databases: Features, Relations, Query Language, Data Modeling with Graphs and Conclusions
Topics:Aggregate and Cluster, Scatter Gather and MapReduce, MapReduce , Why Spark?, Spark (Example, task and stages), Docker Example, Scala and Anonymous Functions, Next Topics in 2/2
General introduction to Single Page Applications using AngularJS with a final demo. Thanks to Nicola Sanitate and Francesco Abbattista
Topics:spark-shell, pyspark, HDFS, how to copy file to HDFS, spark transformations, spark actions, Spark SQL (Shark), spark streaming, streaming transformation stateless vs stateful, sliding windows, examples