Transduction (machine learning)

In logic, statistical inference, and machine learning, transduction or transductive inference is reasoning from observed, specific cases to other specific cases. In contrast, induction is reasoning from observed, specific cases to general principles or causes.

There are at least two specific interpretations of this mode of reasoning. In machine learning, support vector machines and Gaussian processes are said to implement transductive inference, since outputs for new cases are computed without constructing an explicit model. In contrast, supervised learning is an example of inductive reasoning. Supervised learning methods such as neural networks and classification trees construct an explicit model from observed examples, and then outputs for new cases are computed from the model.

The use of support vector machines for transductive inference was originated by Vladimir Vapnik. According to Vapnik, as a motivating principle for machine learning, transduction is preferable to induction since induction requires the solution of a more general problem (inferring an unobserved model) before solving a more specific problem (computing outputs for new cases).

Transductive inference is especially useful in problems for which there are many examples, but few examples have labels. For example, in web page categorization problems, there are many web pages, but few web pages have known categories (as assigned by a human expert).

Bayesian inference yields another interpretation of transduction. In Bayesian inference, transduction is the computation the posterior probability of new cases given previous, observed cases. The dependence on a predictive model is removed by averaging (integrating) over all models considered possible, weighting each model by its posterior probability given the observed cases.

Suppose the set of possible models is parametrized by θ. It is typically assumed that new cases y are independent of the previous cases x given the parameter θ. Then the probability of y given x can be computed as

p(y|x) = \frac{p(y,x)}{p(x)} =   \frac   {\int p(x|\theta)\, p(y|\theta)\, p(\theta)\, d\theta}   {\int p(x|\theta)\, p(\theta)\, d\theta}

In the Bayesian formulation, it is not quite true that the inference is entirely model-free. It is true that no particular model is assumed, but the integration only brings into play any models for which the prior probability p(θ) is positive. In practical terms, the integration is only over models considered a priori to be plausible to at least some degree. This set of models is necessarily smaller than the set of all conceivable models.

References

See also: Transduction (machine learning), Bayesian inference, Classification tree, Gaussian process, Induction (philosophy), Logic, Machine learning, Neural network, Posterior probability, Reasoning