Introduction to Statistical Learning Theory

Bousquet, Olivier; Boucheron, Stéphane; Lugosi, Gábor

doi:10.1007/978-3-540-28650-9_8

Olivier Bousquet²¹,
Stéphane Boucheron²² &
Gábor Lugosi²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3176))

Included in the following conference series:

Summer School on Machine Learning

24k Accesses
121 Citations
3 Altmetric

Abstract

The goal of statistical learning theory is to study, in a statistical framework, the properties of learning algorithms. In particular, most results take the form of so-called error bounds. This tutorial introduces the techniques that are used to obtain such results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
MATH Google Scholar
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Book MATH Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley, New York (1973)
MATH Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, New York (1972)
MATH Google Scholar
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Google Scholar
Kulkarni, S., Lugosi, G., Venkatesh, S.: Learning pattern classification—a survey. IEEE Transactions on Information Theory 44, 2178–2206 (1998) (Information Theory: 1948–1998. Commemorative special issue)
Google Scholar
Lugosi, G.: Pattern classification and learning theory. In: Györfi, L. (ed.) Principles of Nonparametric Learning, pp. 5–62. Springer, Viena (2002)
Google Scholar
McLachlan, G.: Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York (1992)
Book MATH Google Scholar
Mendelson, S.: A few notes on statistical learning theory. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS, vol. 2600, pp. 1–40. Springer, Heidelberg (2003)
Chapter Google Scholar
Natarajan, B.: Machine Learning: A Theoretical Approach. Morgan Kaufmann, San Mateo (1991)
Google Scholar
Vapnik, V.: Estimation of Dependencies Based on Empirical Data. Springer, New York (1982)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Vapnik, V., Chervonenkis, A.: Theory of Pattern Recognition, Nauka, Moscow (1974) (in Russian); German translation: Theorie der Zeichenerkennung. Akademie Verlag, Berlin (1979)
Google Scholar
von Luxburg, U., Bousquet, O., Schölkopf, B.: A compression approach to support vector model selection. The Journal of Machine Learning Research 5, 293–323 (2004)
MathSciNet MATH Google Scholar
McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics 1989, pp. 148–188. Cambridge University Press, Cambridge (1989)
Chapter Google Scholar
Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)
Article MATH Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Ledoux, M., Talagrand, M.: Probability in Banach Space. Springer, New York (1991)
Book MATH Google Scholar
Sauer, N.: On the density of families of sets. Journal of Combinatorial Theory Series A 13, 145–147 (1972)
Article MathSciNet MATH Google Scholar
Shelah, S.: A combinatorial problem: Stability and order for models and theories in infinity languages. Pacific Journal of Mathematics 41, 247–261 (1972)
Article MathSciNet MATH Google Scholar
Alesker, S.: A remark on the Szarek-Talagrand theorem. Combinatorics, Probability, and Computing 6, 139–144 (1997)
Article MathSciNet MATH Google Scholar
Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM 44, 615–631 (1997)
Article MathSciNet MATH Google Scholar
Cesa-Bianchi, N., Haussler, D.: A graph-theoretic generalization of the Sauer-Shelah lemma. Discrete Applied Mathematics 86, 27–35 (1998)
Article MathSciNet MATH Google Scholar
Frankl, P.: On the trace of finite sets. Journal of Combinatorial Theory, Series A 34, 41–45 (1983)
Article MathSciNet MATH Google Scholar
Haussler, D.: Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 217–232 (1995)
Article MathSciNet MATH Google Scholar
Szarek, S., Talagrand, M.: On the convexified Sauer-Shelah theorem. Journal of Combinatorial Theory, Series B 69, 183–192 (1997)
Article MathSciNet MATH Google Scholar
Dudley, R.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Giné, E.: Empirical processes and applications: an overview. Bernoulli 2, 1–28 (1996)
Article MathSciNet MATH Google Scholar
van der Waart, A., Wellner, J.: Weak convergence and empirical processes. Springer, New York (1996)
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36, 929–965 (1989)
Article MathSciNet MATH Google Scholar
Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Information and Computation 82, 247–261 (1989)
Article MathSciNet MATH Google Scholar
Dudley, R.: Central limit theorems for empirical measures. Annals of Probability 6, 899–929 (1978)
Article MathSciNet MATH Google Scholar
Dudley, R.: Empirical processes. In: Ecole de Probabilité de St. Flour 1982. Lecture Notes in Mathematics, vol. 1097, Springer, New York (1984)
Chapter Google Scholar
Dudley, R.: Universal Donsker classes and metric entropy. Annals of Probability 15, 1306–1326 (1987)
Article MathSciNet MATH Google Scholar
Talagrand, M.: The Glivenko-Cantelli problem. Annals of Probability 15, 837–870 (1987)
Article MathSciNet MATH Google Scholar
Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Annals of Probability 22, 28–76 (1994)
Article MathSciNet MATH Google Scholar
Vapnik, V., Chervonenkis, A.: Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications 26, 821–832 (1981)
MATH Google Scholar
Assouad, P.: Densité et dimension. Annales de l’Institut Fourier 33, 233–282 (1983)
Article MathSciNet MATH Google Scholar
Cover, T.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers 14, 326–334 (1965)
Article MATH Google Scholar
Dudley, R.: Balls in R ^k do not cut all subsets of k + 2 points. Advances in Mathematics 31(3), 306–308 (1979)
Article MathSciNet MATH Google Scholar
Goldberg, P., Jerrum, M.: Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. Machine Learning 18, 131–148 (1995)
MATH Google Scholar
Karpinski, M., Macintyre, A.: Polynomial bounds for vc dimension of sigmoidal and general Pfaffian neural networks. Journal of Computer and System Science 54 (1997)
Google Scholar
Khovanskii, A.G.: Fewnomials. Translations of Mathematical Monographs, vol. 88. American Mathematical Society, Providence (1991)
MATH Google Scholar
Koiran, P., Sontag, E.: Neural networks with quadratic vc dimension. Journal of Computer and System Science 54 (1997)
Google Scholar
Macintyre, A., Sontag, E.: Finiteness results for sigmoidal neural networks. In: Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pp. 325–334. Association of Computing Machinery, New York (1993)
Google Scholar
Steele, J.: Existence of submatrices with all possible columns. Journal of Combinatorial Theory, Series A 28, 84–88 (1978)
Article MathSciNet MATH Google Scholar
Wenocur, R., Dudley, R.: Some special Vapnik-Chervonenkis classes. Discrete Mathematics 33, 313–318 (1981)
Article MathSciNet MATH Google Scholar
McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–248. Springer, New York (1998)
Chapter Google Scholar
Ahlswede, R., Gács, P., Körner, J.: Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34, 157–177 (1976) (correction in 39:353–354,1977)
Article MathSciNet MATH Google Scholar
Marton, K.: A simple proof of the blowing-up lemma. IEEE Transactions on Information Theory 32, 445–446 (1986)
Article MathSciNet MATH Google Scholar
Marton, K.: Bounding \(\bar{d}\)-distance by informational divergence: a way to prove measure concentration. Annals of Probability 24, 857–866 (1996)
Article MathSciNet MATH Google Scholar
Marton, K.: A measure concentration inequality for contracting Markov chains. Geometric and Functional Analysis 6, 556–571 (1996); Erratum 7, 609–613 (1997)
Article MathSciNet MATH Google Scholar
Dembo, A.: Information inequalities and concentration of measure. Annals of Probability 25, 927–939 (1997)
Article MathSciNet MATH Google Scholar
Massart, P.: Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86 (1998)
Google Scholar
Rio, E.: Inégalités de concentration pour les processus empiriques de classes de parties. Probability Theory and Related Fields 119, 163–175 (2001)
Article MathSciNet Google Scholar
Talagrand, M.: A new look at independence. Annals of Probability 24, 1–34 (1996) (Special Invited Paper)
Article MathSciNet MATH Google Scholar
Talagrand, M.: Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’I.H.E.S. 81, 73–205 (1995)
Article MathSciNet MATH Google Scholar
Talagrand, M.: New concentration inequalities in product spaces. Inventiones Mathematicae 126, 505–563 (1996)
Article MathSciNet MATH Google Scholar
Luczak, M.J., McDiarmid, C.: Concentration for locally acting permutations. Discrete Mathematics (to appear, 2003)
Google Scholar
McDiarmid, C.: Concentration for independent permutations. Combinatorics, Probability, and Computing 2, 163–178 (2002)
MathSciNet MATH Google Scholar
Panchenko, D.: A note on Talagrand’s concentration inequality. Electronic Communications in Probability 6 (2001)
Google Scholar
Panchenko, D.: Some extensions of an inequality of Vapnik and Chervonenkis. Electronic Communications in Probability 7 (2002)
Google Scholar
Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Annals of Probability (to appear, 2003)
Google Scholar
Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM: Probability and Statistics 1, 63–87 (1997), http://www.emath.fr/ps/
Article MathSciNet MATH Google Scholar
Ledoux, M.: Isoperimetry and Gaussian analysis. In: Bernard, P. (ed.) Lectures on Probability Theory and Statistics, Ecole d’Eté de Probabilités de St-Flour XXIV-1994, pp. 165–294 (1996)
Google Scholar
Bobkov, S., Ledoux, M.: Poincaré’s inequalities and Talagrands’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields 107, 383–400 (1997)
Article MathSciNet MATH Google Scholar
Massart, P.: About the constants in Talagrand’s concentration inequalities for empirical processes. Annals of Probability 28, 863–884 (2000)
Article MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications. Random Structures and Algorithms 16, 277–292 (2000)
Article MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. The Annals of Probability 31, 1583–1614 (2003)
Article MathSciNet MATH Google Scholar
Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.: Moment inequalities for functions of independent random variables. The Annals of Probability (to appear, 2004)
Google Scholar
Bousquet, O.: A Bennett Concentration Inequality and Its Application to Suprema of Empirical Processes. C. R. Acad. Sci. Paris 334, 495–500 (2002)
Article MathSciNet MATH Google Scholar
Giné, E., Zinn, J.: Some limit theorems for empirical processes. Annals of Probability 12, 929–989 (1984)
Article MathSciNet MATH Google Scholar
Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 1902–1914 (2001)
Article MathSciNet MATH Google Scholar
Bartlett, P., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48, 85–113 (2001)
Article MATH Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)
Google Scholar
Koltchinskii, V., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: Giné, E., Mason, D., Wellner, J. (eds.) High Dimensional Probability II, pp. 443–459 (2000)
Google Scholar
Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)
MathSciNet MATH Google Scholar
Bartlett, P.L., Bousquet, O., Mendelson, S.: Localized rademacher complexities. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS, vol. 2375, pp. 44–48. Springer, Heidelberg (2002)
Chapter Google Scholar
Bousquet, O., Koltchinskii, V., Panchenko, D.: Some Local Measures of Complexity of Convex Hulls and Generalization Bounds. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS, vol. 2375, pp. 59–73. Springer, Heidelberg (2002)
Chapter Google Scholar
Antos, A., Kégl, B., Linder, T., Lugosi, G.: Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research 3, 73–98 (2002)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Biological Cybernetics, Spemannstr. 38, D-72076, Tübingen, Germany
Olivier Bousquet
Laboratoire d’Informatique, Université de Paris-Sud, Bâtiment 490, F-91405 Cedex, Orsay, France
Stéphane Boucheron
Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi

Authors

Olivier Bousquet
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Boucheron
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Lugosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pertinence, 32 rue des Jeûneurs, 75002, Paris, France
Olivier Bousquet
Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Ulrike von Luxburg
Friedrich Miescher Laboratory of the Max Planck Society, Spemannstr. 39, 72076, Tübingen, Germany
Gunnar Rätsch

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bousquet, O., Boucheron, S., Lugosi, G. (2004). Introduction to Statistical Learning Theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science(), vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-28650-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23122-6
Online ISBN: 978-3-540-28650-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics