MLTA

Click here to download schedule.

Danish Lohani, SAU

Probability theory for machine learning

The tutorial will cover different approaches of probability, conditional probability, distribution functions and their properties, Joint density functions, mathematical expectations and their properties, Bayesian probability minimum error rate classifications, discriminant functions, decision surfaces together with parameter estimation and supervised learning, useful in Machine learning.

Jay R Bhatnagar, NIT Goa

Information and Learning Theory

Twin Aspects of Shannon Engineering Founded by mathematician-engineer Claude E. Shannon, Information Theory (IT) is widely  seen as breakthrough working theory on the science of information, that laid down the major algorithmic and architectural principles for digital computing & communications besides contributions to decision sciences, investment theory and even biology.  Sometime later, Shannon also gave rise to Artificial Intelligence and thereby learning theory.  Drawing upon elements from both Maths and Stats, IT exudes information as a basic unit to measure the objective physical world comprising of data and decisions. A majestic opera interplaying logic to reasoning to architecture to analysis, IT continues to be the cornerstone of action (~ 70 yrs) with towering results in compression, reliability and secrecy of information. More recently, the theory of Learning/prediction from information presented in the data samples has become vital to AI. In this tutorial-talk, we will induce some of the connectives from the fabric of IT and reason various aspects structured with ML; eliciting some of the interesting parallelisms, differences and applications. As a personal Tribute to Robert Gallager, Thomas Cover and Sergio Verdừ, the attempt is to view the big scene than advocate a specific result. Topics bridging elements of IT are - entropy (Shannon), k-NN(Cover-Hart), method of types (Csiszar-Renyi), Information divergence (Verdừ, Barron),  large deviations (Chernoff, Cover, Breiman, Schapire),  information capacity (Shannon) , recognition capacity (Sullivan, Bhatnagar), error exponents (Gallager, Ziv, Weissman), classification exponent (Bhatnagar), learning theory and quantization (Bhatnagar).

Manoj Singh, BHU

Linear Algebra for Machine Learning and IR

Information retrieval is concerned with representing content in a form that can be easily accessed by users with information needs. Information retrieval, as a field, works primarily with highly unstructured content, such as text documents written in natural language; it deals with information needs that are generally not formulated according to precise specifications; and its criteria for success are based in large part on the demands of a diverse set of human users. Most of the models for IR and text analytics are linear. Most primitive frequently used model is vector space model, in which data or information is represented in vector or matrix form. Most of the method for preprocessing and information retrieval are using the similarity of the data, which is in mostly in vector matrix form. Numerical similarity metrics on documents suggest natural approaches for similarity-based indexing by representing textual queries as vectors and searching for their nearest neighbors in a collection of documents - as well as for clustering. But with a large number of underlying terms, these vector operations are being carried out in a huge number of dimensions. Very high dimensionality can be a problem not only from the point of view of computational efficiency, but also because the large number of terms leads to sets of vectors with very sparse patterns of non-zeroes, in which relationships among terms can be difficult to detect or exploit. Many methods based on machine learning approach such as support vector machine (SVM), artificial neural network (ANN) and matrix decomposition based method are proposed in literature. In this talk the underlying theory of linear algebra for machine learning methods and matrix decomposition based method will be presented. As well as the application to the different machine learning methods and matrix methods for dimensional reduction, clustering and classification will be presented.

David Barber, University College London, UK

Keynote: Machine Learning and Engineering Intelligence

Machine Learning broadly aims to find underlying structure in typically large-scale data. In contrast to traditional statistics, the underlying motivations of the Machine Learning community tend to focus on the development of methods that facilitate human-machine interaction and the replication of the information processing skills of natural organisms. For example, typical areas of interest are speech recognition, natural language processing, visual object recognition, robotic control and artificial intelligence. Machine Learning also has other motivations in more general statistical modelling and prediction scenarios, often with the emphasis being on massive databases and limited prior understanding of appropriate underlying mechanisms generating the data. I'll discuss some of the conceptual landscape of Machine Learning, it's history and motivations.

David Barber

Tutorial 1: 16th: Machine Learning and Probabilistic Modelling

Probability is a useful framework for modelling underlying processes generating data. This framework is consistent with many scientific approaches in which an underlying physical law is postulated, with discrepancies between the observations and the model accounted for by stochastic effects. Probability can also be seen as an extension of classical logical reasoning systems. Whilst the framework of probability has long been considered advantageous of Machine Learning and Artificial Intelligence, working with high-dimensional probability distributions (which are typically required in Machine Learning) is highly non-trivial. For this reason, imposing strong structural constraints on the underlying models is vital and these constraints can often be conveniently expressed in the language of Graph Theory. The marriage of Graph and Probability Theory is the field of Graphical Models which has become a popular framework in both Machine Learning, Statistics and Engineering to represent large-scale probabilistic models. I'll discuss applications of this framework in natural language modelling and engineering, including tracking and signal decoding.

Pushpak Bhattacharya, IITB

Machine Translation: a perspective

We will present in this talk our long standing work on three predominant approaches to machine translation (MT): interlingua, transfer and statistical. The languages involved are English, Hindi and Marathi.  At the outset there will be a description of the foundations of MT. Then we will touch upon language divergences which pose as the main challenges for MT. Different SMT approaches like phrase based. Tree based, hierarchical and factor based SMT will be discussed. Factor based SMT is effective for morphologically rich languages, and we will present results on this. The talk will end with remarks on evaluation of MT systems.

Rakesh Agrawal, Microsoft Research

Computational Education: A New Frontier for Data Researchers?

The three basic questions related to education are: i) what is taught, ii) how is the educational material delivered, and iii) how is the education funded? The recent technological advances are resulting in dramatically new answers to these question. We examine some of these trends and present concrete results from our research that suggest that the data scientists have a central role to play in bringing computational thinking to education.

Dr Radhika Mamidi, IIITH

Tutorial on Computational Morphology

Computational morphology deals with processing of words and word forms. For this, it is necessary to have the knowledge of the internal structure of words and word formation rules.  Tools built on the basis of this knowledge include morphological analysers and generators. These tools are an important part of NLP applications like Machine Translation, Information Retrieval and Text-to-Speech systems. They also help in bridging the gap between lexical resources such as corpora and lexicons, and the overall field of natural language processing. The first half of the tutorial will deal with WORDS with the emphasis being on how different languages encode information in words. The second half of the tutorial will be about decoding this information using techniques that are paradigm based and finite state based.

David Barber

Tutorial 2: 18th: Bayesian Machine Learning

Parameter estimation is a central issue in Machine Learning and Statistical Modelling. The Bayesian approach assumes that no single parameter is optimal, rather that one should ideally consider a distribution over possible parameters. I'll explain how to tackle parameter inference problems the Bayesian way in both small and large-scale Machine Learning problems, including a discussion of Bayesian Classification and Regression. We will also discuss how some of the computational issues which often arise in Bayesian Machine Learning may be addressed using deterministic and stochastic approximation methods.

Deepayan Sarkar, ISI Delhi

Tutorial on R

This tutorial will give a broad overview of R. It is aimed at beginners and also those who already have some experience with R. It will cover Introduction to the R language, basic constructs, matrix manipulation and package organization. Some basics on statistical inference will also be introduced.

Bing Liu, UIC

Detection of Fake or Deceptive Opinions

Opinions from social media are increasingly used by individuals and businesses for making purchase decisions and making choices at elections and for marketing and product design. Positive opinions often mean profits and fames for businesses and individuals, which, unfortunately, give strong incentives for people to game the system by posting fake or deceptive opinions to promote or to discredit some target products, services, businesses, individuals, and even ideas without disclosing their true intentions, or the person or organization that they are secretly working for. Such individuals are called opinion spammers and their activities are called opinion spamming. Opinion spamming is now widely used as a very cheap way of marketing. Spamming cases are also frequently reported in the international press. Opinion spamming not only hurt consumers and damage businesses, but also can be frightening if it is about opinions on social and political issues as it can warp opinions and mobilize masses into positions counter to legal or ethical mores. To ensure that the social media is a trusted source of public opinions, rather than being full of fakes, lies, and deceptions, deceptive/fake opinions must be detected. In this talk, I will first introduce this research topic and its challenges and then discuss some state-of-the-art detection algorithms.

Madhu Kumari, NIT Hamirpur

Text Mining and Social Media

Social media on web such as blogs, Facebook, Twitter, YouTube and Flickr made people of current age equipped with freedom to express their social, political and business interests, and to connect to each other. This gigantic growth of social media opens up numerous possibilities to study human interactions and collective intelligence. Therefore, social network analysis is evolving into a paradigm of distinct structural theories and associated relational methods. As text plays vital role in communication and it is almost inevitable to exclude text from social media, hence it becomes quite obvious to analyse and exploit text available on web based social media. The focus of this session is to expound on complex text data that exists abundantly in social media and to make this data usable for the network analysis tasks e.g. quantifying the crowds’ behaviour, link analysis, longitudinal impacts of comments, posts and tags etc.

Alexandar Gelbukh, NPI, Mexico

Natural Language Processing: Applications and Current Trends

Natural language processing is a research discipline that enables computers to meaningfully process language we humans use for communication. While a seemingly trivial task for humans, it proves to be quite difficult for a computer. We will discuss why it is so difficult for the computers and how such difficulties are overcome; what research directions are there, and how the efficiency of the proposed solutions is evaluated. An overview of applications of natural language processing to practical problems will be given. Finally, current research directions of the Natural Language Laboratory of the CIC-IPN will be presented.

Vivek Singh, SAU

Sentiment Analysis

Sentiment analysis is language processing task that uses an algorithmic formulation to identify opinionated content and categorize it as having ‘positive’, ‘negative’ or ‘neutral’ polarity. It has been formally defined as an approach that works on a quintuple ; where, Oi is the target object, Fij is a feature of the object Oi, Skijl is the sentiment polarity (+ve, -ve or neutral) of opinion of holder k on jth feature of object i at time l, and Tl is the time when the opinion is expressed [1]. It can be clearly inferred from this definition that sentiment analysis involves a number of tasks ranging from identifying whether the target carries an opinion or not and if it carries an opinion then to classify the opinion as having ‘positive’ or ‘negative’ polarity. The sentiment analysis task may be done at different levels, document-level, sentence-level or aspect-level. There are broadly two kinds of approaches for sentiment analysis: those based on machine learning classifiers and those based on lexicon. The machine learning classifiers for sentiment analysis are usually a kind of supervised machine learning paradigm that uses training on labelled data before they can be applied to the actual sentiment classification task. Lexicon-based methods on the other hand extracts some selected features and use a dictionary look up to compute their sentiment polarities and aggregate them in some way to find overall polarity. The tutorial aims to introduce the sentiment analysis problem and characterize various approaches. Some standard datasets and application areas will also be discussed.

Dr. Jayadeva, IIT Delhi

A Quick Tutorial on SVMs

Support Vector Machines (SVMs) have become the machine learning paradigm of choice in the last decade. This is a whirlwind tutorial of SVMs, with a little discussion on some recent advances that have emerged from IIT Delhi - and a sneak preview of some new results.

Indrajit Bhattacharya, IBM Research India

Probabilistic Topic Models

As evident from the name, probabilistic topic models were originally proposed to discover topics of discourse from collection of text documents. However, over the last decade, they have emerged as one of the most popular techniques for unsupervised probabilistic analysis, and have applied to analysis of images, videos, music, social and other networks, biological data, and other data types. In this tutorial, we will start with the fundamentals of probabilistic generative models, and their application in textual data analysis, starting from the simple unigram mixture model and moving on to Latent Dirichlet Allocation (LDA). We will consider the problem of learning and inference using topic models, investigate why these are intractable, and look at approximate inference strategies based on Gibbs sampling and variational techniques. Finally, we will look at some of the short-comings of LDA, and briefly discuss more advanced topic models, such as syntactic, correlated, dynamic, relational and supervised topic models.

Tanveer Jahan Siddiqui, Allahabad University

Multi-Document Summarization

This tutorial will focus on automatic text summarization, in particular on multi-document summarization. The tutorial is in two parts. In the first part, I will introduce basic concepts involved in creating single document summary. After a quick overview of what is multi-document summary and how it differs from single document summary, I will give an overview of automatic evaluation of summarization systems. In the second part of the tutorial, I will discuss issues and challenges specific to multi-document summarization and its applications. During the course of the talk, I will attempt to survey existing statistical and shallow semantic approaches to multi-document summarization. Finally, future challenges will be discussed.

Niladri Chatterjee, IIT Delhi

Statistical Machine Translation

Machine Translation, or automated translation of text of one natural language into another, is a challenging task both technically and linguistically. Although traditionally considered to be an Artificial Intelligence problem, Statistical Machine Translation (SMT) has gained popularity in last one decade or so. With the availability of huge parallel corpora SMT looks at computing different probabilities (e.g Unigram, bigram, trigram for Language modeling; translation probabilities using alignment functions) from those corpora which are used to generate the most probable translation of a given input sentence. The whole idea started with 5 IBM models in 1993, which have further been extended by different researchers. In this tutorial we first look at Machine translation as a subject, and examine its difficulties with focus on English to Hindi (and some other Indian languages) machine translations. It also pays a short visit to the history of MT and different MT paradigms. Then it develops the technique of statistical MT starting from the IBM models in a systematic way and illustrates the development in a step-by-step way. The tutorial deals with some SMT software such as Moses and Giza++.

Asif Ekbal, IITP

NERC and Indian languages

The talk will begin a brief introduction to the problems of NERC, quick references to the works carried out in different languages; and then focus on evolutionary and machine learning approaches for NERC in Indian languages. A brief introduction to bio-text mining along with biomedical and /or chemical entity extraction will be covered at the end.

P K Singh, IIITM, Gwalior

Feature extraction in text clustering

Feature reduction is an active area of research to pre-process the (original) data to make it suitable to mining especially in the context of text mining. Feature reduction is mainly categorized as feature selection and feature extraction. Feature selection is a method to choose the most relevant subset of original features. In other words, it is a process of rejecting irrelevant and/or redundant features unrelated to mining task according to certain criteria. It reduces complexity of the problem, increases efficiency of processing and simplifies the design of classifier. However, feature extraction is a process of extracting new set of reduced features from original features based on some transformation of attributes. It means feature extraction is a process through which a set of new features is created. The feature selection methods are further categorized as filter, wrapper, and hybrid on the basis of evaluation function. Wrapper and hybrid approaches are effectively performed by nature-inspired algorithms to enhance the quality of solutions. This presentation introduces the basic concepts and algorithms of feature selection/extraction and presents a brief of different models of feature selection and a brief survey of existing algorithms associated with these models, most importantly the nature-inspired algorithms applied for wrapper and hybrid approaches. Afterwards, multi-objective approaches are also discussed for feature selection method. Finally, the report concludes by focusing on challenges and guidelines towards future feature selection/extraction research.

Ganesh Ramakrishnan, IIT Bombay

Statistical Relational Learning

The tutorial will cover the following:
Part A: Simple output spaces, relational input spaces
1] Introduction to Kernel machines for Relational output spaces.
2] Kernels for relational learning: String kernels, tree kernels, graph kernels, first order kernels
3] Logic primer: Propositional and first order
4] Inference: In propositional and first order logic.
Part B: Relational output spaces, relational input spaces
5] Struct SVM: SVM on sequence and tree structured output spaces.
6] Kernelised Struct SVM
7] Max Margin Markov Logic Network: SVM on (first order) logically structured output spaces
8] Kernelised Max Margin Markov Network