TUTORIAL 1: RDF graph summarization: principles, techniques and applications

Organizers: Haridimos Kondylakis (FORTH), Dimitris Kotzinos (ETIS Lab, UMR 8051, University of Cergy-Pontoise), Ioana Manolescu (Inria and LIX, UMR 7161, CNRS and Ecole polytechnique)
Duration: 3 hours
Abstract: The explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity.
This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort is needed.

TUTORIAL 2: Schemas And Types For JSON Data

Organizers: Mohamed-Amine Baazizi (Sorbonne Université, LIP6 UMR 7), Dario Colazzo (Université Paris-Dauphine, PSL Research University), Giorgio Ghelli (Università di Pisa), Carlo Sartiani (DIMIE Università della Basilicata)
Duration: 1.5 hours
Abstract: The last few years have seen the fast and ubiquitous diffusion of JSON as one of the most widely used formats for publishing and interchanging data, as it combines the flexibility of semistructured data models with well-known data structures like records and arrays. The user willing to effectively manage JSON data collections can rely on several schema languages, like JSON Schema, JSound, and Joi, or on the type abstractions offered by modern programming languages like Swift or TypeScript. The main aim of this tutorial is to provide the audience with the basic notions for enjoying all the benefits that schema and types can offer while processing and manipulating JSON data. This tutorial focuses on four main aspects of the relation between JSON and schemas: (1) we survey existing schema language proposals and discuss their prominent features; (2) we review how modern programming languages support JSON data as first-class citizens; (3) we analyze tools that can infer schemas from data, or that exploit schema information for improving data parsing and management; and (4) we discuss some open research challenges and opportunities related to JSON data.

TUTORIAL 3: Influence Maximization Revisited: The State of the Art and the Gaps that Remain

Organizers: Akhil Arora (EPFL Lausanne), Sainyam Galhotra (UMass Amherst), Sayan Ranu (IIT Delhi)
Duration: 1.5 hours
Abstract: The steady growth of graph data from social networks has resulted in wide-spread research in finding solutions to the influence maximization (IM) problem. This results in extension of the state-of-the-art almost every year. With the recent explosion in the application of IM in solving real-world problems, it is no longer a theoretical exercise. Today, IM is used in a plethora of real-world scenarios, with OnePlus series of mobile phones, Hokey Pokey ice-creams, and galleri5 influencer marketplace being the most prominent industrial use-cases. Given this scenario, navigating the maze of IM techniques to get an in-depth understanding of their utilities is of prime importance. In this tutorial, we address this paramount issue and solve the dilemma of “Which IM technique to use and under What scenarios”? “What does it really mean to claim to be the state-of-the-art”?
This tutorial builds upon our benchmarking study (a recipient of the SIGMOD Reproducibility Award) and will provide a concise and intuitive overview of the most important IM techniques, which is usually lost in the technical literature. More fundamentally, we will unearth a series of incorrect claims made by prominent IM papers, disseminate the inherent deficiencies of existing approaches and surface the open challenges in IM even after a decade of research.