Subject: BIG DATA ANALYTICS (A.A. 2020/2021)
Unit Big Data Analytics
Information Technology (lesson)
The course concernsthe main techniques for managing and analyzing large amounts of information in the context of Big Data and Data Science.
The specific objectives are:
-present principles of modeling and manipulation of NOSQL data;
-present Big Data architectures;
- present the data life cycle in Data Analytics and the main techniques
- present text analytics techniques;
-present graph analytics techniques;
-provide the ability to identify the appropriate tools for solving an information management problem in Big Data contexts;
- provide the ability to use existing techniques and devise innovative techniques for solving data analytics problems.
Basics of relational database management and full-text manipulation and search; fundamental algorithms and data structures.
Introduction, sources and data types
NoSQL data models: key-value, document-based, column-based, graph-based
BigData architectures: CAP theorem, Hadoop, Spark
MongoDB: design scheme, CRUD commands, indexing, aggregation, replication
Neo4j: overview, Cypher language, selection, aggregation, node creation, indexing operations
Data life cycle
Exploratory Data analysis
The PANDAS package for manipulating tabular data
Introduction and use cases
Text classification: definition, classification methods, supervised machine learning, Naive Bayes classifier (formalization, learning), classification evaluation (accuracy, precision, recall, f-measure)
Sentiment analysis: definition, techniques, advanced tasks, sentiment lexicon
Relation extraction: definition, techniques, use of patterns and named entity
Classification use cases, sentiment analysis and relation extraction with NLTK libraries, Spacy
Introduction and use cases
Graph analytics tasks, tools and algorithms: path analysis, connectivity analysis, centrality analysis, community analysis
Use of Neo4j and application to case studies
In addition to providing insights on the theory and techniques proposed, lessons include a series of practical and design activities to "touch" the main technological solutions seen in class. At the end of the course, the student will have a complete vision on how to best design, structure and implement data-centric applications that involve data analytics. The lessons will be conducted remotely due to the COVID19 health situation.
The course includes a series of activities to be carried out individually on the topics covered during the course. The practical nature of the activities, in the form of in-depth analysis of the techniques seen in class and their application in the context of a real data-centric application, allows the student to evaluate the ability to respond to specific information management requirements with effective and efficient solutions. Activity reports and codes must be delivered before the exam, the exam consists in the presentation and oral discussion of the activities carried out. The oral discussion could take place in the presence or remotely depending on the evolution of the COVID19 situation.
Knowledge and understanding: Through the lessons, the student will have solid knowledge and understanding in the field of big data and data analytics.
Ability to apply knowledge and understanding: Through practical activities, in-depth analysis of existing platforms and technologies, and the performance of individual activities, the student will be able to apply the knowledge acquired in the modeling, design and implementation of data analytics projects also in the context of big data.
Making judgments: By carrying out the proposed activities, the student will be able to evaluate, expose and critically discuss the design choices adopted in a real data analytics project.
Communication skills: the writing of the reports relating to the proposed activities and the related presentation will enable the student to organize and present the results of his work with clarity and conciseness, as well as with appropriate technical language. Moreover, the practical performance of the activity will allow the student to deepen the technical terminology in English.
Learning skills: The activities described will allow the student to acquire the methodological tools to be able to independently provide for his own updating, particularly crucial in an area such as the information management IT, where technologies are often in continuous evolution.
Dispense in inglese a cura dei docenti disponibili sul sito del corso.
Il materiale del corso includerà anche una lista di libri e articoli scientifici disponibili per ciascuno degli argomenti trattati, consigliati per eventuali approfondimenti individuali.