You are here: Home » Study Plan » Subject

Sciences

Subject: BIG DATA ANALYTICS (A.A. 2022/2023)

master degree course in COMPUTER SCIENCE

Course year 1
CFU 9
Teaching units Unit Big Data Analytics
Information Technology (lesson)
  • TAF: Compulsory subjects, characteristic of the class SSD: INF/01 CFU: 9
Teachers: Riccardo MARTOGLIA, Federica MANDREOLI
Moodle portal
Exam type oral
Evaluation final vote
Teaching language Italiano
Contents download pdf download

Teachers

Riccardo MARTOGLIA
Federica MANDREOLI

Overview

The course concernsthe main techniques for managing and analyzing large amounts of information in the context of Big Data and Data Science.
The specific objectives are:
-present principles of modeling and manipulation of NOSQL data;
-present Big Data architectures;
- present the data life cycle in Data Analytics and the main techniques
- present text analytics techniques;
-present graph analytics techniques;
-provide the ability to identify the appropriate tools for solving an information management problem in Big Data contexts;
- provide the ability to use existing techniques and devise innovative techniques for solving data analytics problems.
 

Admission requirements

Basics of relational database management and full-text manipulation and search; fundamental algorithms and data structures.

Course contents

The course includes 63 hours of frontal teaching (9 CFU), each of the topics include both the exposition of the theoretical aspects and the execution of various use cases and practical exercises.

The details of the contents in terms of hours is purely indicative. It may in fact undergo changes during the course of teaching in light of the feedback and participation of students.

Introduction (2 hours)

Big Data (28 hours)
- Introduction, sources and data types
- NoSQL data models: key-value, document-based, column-based, graph-based
- BigData architectures: CAP theorem, Hadoop, Spark
- MongoDB: design scheme, CRUD commands, indexing, aggregation, replication
- Neo4j: overview, Cypher language, selection, aggregation, node creation, indexing operations

Data Analytics (10 hours)
- Data life cycle
- Data preparation
- Exploratory Data analysis
- The PANDAS package for manipulating tabular data

Text analytics (12 hours)
- Introduction and use cases
- Text classification: definition, classification methods, supervised machine learning, Naive Bayes classifier (formalization, learning), classification evaluation (accuracy, precision, recall, f-measure)
- Sentiment analysis: definition, techniques, advanced tasks, sentiment lexicon
- Relation extraction: definition, techniques, use of patterns and named entity
- Classification use cases, sentiment analysis and relation extraction with NLTK libraries, Spacy

Graph analytics (11 hours)
- Introduction and use cases
- Graph-structured data
- Graph analytics tasks, tools and algorithms: path analysis, connectivity analysis, centrality analysis, community analysis
- Use of Neo4j and application to case studies

Teaching methods

In addition to providing insights on the theory and techniques proposed, lessons include a series of practical and design activities to "touch" the main technological solutions seen in class. At the end of the course, the student will have a complete vision on how to best design, structure and implement data-centric applications that involve data analytics. Ordinarily, these activities will be carried out face to face in classrooms and laboratories; in case of COVID19 health emergencies, the lessons will be held remotely, both in virtual asynchronous mode and in synchronous virtual mode to enable teacher-student interaction. Questions, interventions and student participation are welcome and encouraged. Attendance is not compulsory, but strongly recommended. The course is held in Italian. All technical and organizational information on teaching, as well as teaching material, will be uploaded to the moodle.unimore.it platform. The student is invited to register and consult this platform regularly.

Assessment methods

The course includes a series of activities to be carried out individually on the topics covered during the course. The practical nature of the activities, in the form of in-depth analysis of the techniques seen in class and their application in the context of a real data-centric application, allows the student to evaluate the ability to respond to specific information management requirements with effective and efficient solutions. Activity reports and codes must be delivered before the exam, the exam consists in the presentation and oral discussion of the activities carried out (approx duration 20-25 minutes). The final grade is communicated via email typically within one week from the date of the exam. The oral discussion could take place in the presence or remotely depending on the evolution of the COVID19 situation.

Learning outcomes

Knowledge and understanding: Through the lessons, the student will have solid knowledge and understanding in the field of big data and data analytics.

Ability to apply knowledge and understanding: Through practical activities, in-depth analysis of existing platforms and technologies, and the performance of individual activities, the student will be able to apply the knowledge acquired in the modeling, design and implementation of data analytics projects also in the context of big data.

Making judgments: By carrying out the proposed activities, the student will be able to evaluate, expose and critically discuss the design choices adopted in a real data analytics project.

Communication skills: the writing of the reports relating to the proposed activities and the related presentation will enable the student to organize and present the results of his work with clarity and conciseness, as well as with appropriate technical language. Moreover, the practical performance of the activity will allow the student to deepen the technical terminology in English.

Learning skills: The activities described will allow the student to acquire the methodological tools to be able to independently provide for his own updating, particularly crucial in an area such as the information management IT, where technologies are often in continuous evolution.

Readings

Dispense in inglese a cura dei docenti disponibili sul sito del corso.
Il materiale del corso includerà anche una lista di libri e articoli scientifici disponibili per ciascuno degli argomenti trattati, consigliati per eventuali approfondimenti individuali.