Skip to main content

Introduction to Statistical Learning Theory

  • Chapter
Book cover Advanced Lectures on Machine Learning (ML 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3176))

Included in the following conference series:

Abstract

The goal of statistical learning theory is to study, in a statistical framework, the properties of learning algorithms. In particular, most results take the form of so-called error bounds. This tutorial introduces the techniques that are used to obtain such results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

  2. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  3. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)

    MATH  Google Scholar 

  4. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)

    Book  MATH  Google Scholar 

  5. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley, New York (1973)

    MATH  Google Scholar 

  6. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, New York (1972)

    MATH  Google Scholar 

  7. Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)

    Google Scholar 

  8. Kulkarni, S., Lugosi, G., Venkatesh, S.: Learning pattern classification—a survey. IEEE Transactions on Information Theory 44, 2178–2206 (1998) (Information Theory: 1948–1998. Commemorative special issue)

    Google Scholar 

  9. Lugosi, G.: Pattern classification and learning theory. In: Györfi, L. (ed.) Principles of Nonparametric Learning, pp. 5–62. Springer, Viena (2002)

    Google Scholar 

  10. McLachlan, G.: Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York (1992)

    Book  MATH  Google Scholar 

  11. Mendelson, S.: A few notes on statistical learning theory. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS, vol. 2600, pp. 1–40. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Natarajan, B.: Machine Learning: A Theoretical Approach. Morgan Kaufmann, San Mateo (1991)

    Google Scholar 

  13. Vapnik, V.: Estimation of Dependencies Based on Empirical Data. Springer, New York (1982)

    MATH  Google Scholar 

  14. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  15. Vapnik, V., Chervonenkis, A.: Theory of Pattern Recognition, Nauka, Moscow (1974) (in Russian); German translation: Theorie der Zeichenerkennung. Akademie Verlag, Berlin (1979)

    Google Scholar 

  16. von Luxburg, U., Bousquet, O., Schölkopf, B.: A compression approach to support vector model selection. The Journal of Machine Learning Research 5, 293–323 (2004)

    MathSciNet  MATH  Google Scholar 

  17. McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics 1989, pp. 148–188. Cambridge University Press, Cambridge (1989)

    Chapter  Google Scholar 

  18. Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)

    Article  MATH  Google Scholar 

  19. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  20. Ledoux, M., Talagrand, M.: Probability in Banach Space. Springer, New York (1991)

    Book  MATH  Google Scholar 

  21. Sauer, N.: On the density of families of sets. Journal of Combinatorial Theory Series A 13, 145–147 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  22. Shelah, S.: A combinatorial problem: Stability and order for models and theories in infinity languages. Pacific Journal of Mathematics 41, 247–261 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  23. Alesker, S.: A remark on the Szarek-Talagrand theorem. Combinatorics, Probability, and Computing 6, 139–144 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  24. Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM 44, 615–631 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  25. Cesa-Bianchi, N., Haussler, D.: A graph-theoretic generalization of the Sauer-Shelah lemma. Discrete Applied Mathematics 86, 27–35 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  26. Frankl, P.: On the trace of finite sets. Journal of Combinatorial Theory, Series A 34, 41–45 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  27. Haussler, D.: Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 217–232 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  28. Szarek, S., Talagrand, M.: On the convexified Sauer-Shelah theorem. Journal of Combinatorial Theory, Series B 69, 183–192 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  29. Dudley, R.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  30. Giné, E.: Empirical processes and applications: an overview. Bernoulli 2, 1–28 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  31. van der Waart, A., Wellner, J.: Weak convergence and empirical processes. Springer, New York (1996)

    Google Scholar 

  32. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36, 929–965 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  33. Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Information and Computation 82, 247–261 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  34. Dudley, R.: Central limit theorems for empirical measures. Annals of Probability 6, 899–929 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  35. Dudley, R.: Empirical processes. In: Ecole de Probabilité de St. Flour 1982. Lecture Notes in Mathematics, vol. 1097, Springer, New York (1984)

    Chapter  Google Scholar 

  36. Dudley, R.: Universal Donsker classes and metric entropy. Annals of Probability 15, 1306–1326 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  37. Talagrand, M.: The Glivenko-Cantelli problem. Annals of Probability 15, 837–870 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  38. Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Annals of Probability 22, 28–76 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  39. Vapnik, V., Chervonenkis, A.: Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications 26, 821–832 (1981)

    MATH  Google Scholar 

  40. Assouad, P.: Densité et dimension. Annales de l’Institut Fourier 33, 233–282 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  41. Cover, T.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers 14, 326–334 (1965)

    Article  MATH  Google Scholar 

  42. Dudley, R.: Balls in R k do not cut all subsets of k + 2 points. Advances in Mathematics 31(3), 306–308 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  43. Goldberg, P., Jerrum, M.: Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. Machine Learning 18, 131–148 (1995)

    MATH  Google Scholar 

  44. Karpinski, M., Macintyre, A.: Polynomial bounds for vc dimension of sigmoidal and general Pfaffian neural networks. Journal of Computer and System Science 54 (1997)

    Google Scholar 

  45. Khovanskii, A.G.: Fewnomials. Translations of Mathematical Monographs, vol. 88. American Mathematical Society, Providence (1991)

    MATH  Google Scholar 

  46. Koiran, P., Sontag, E.: Neural networks with quadratic vc dimension. Journal of Computer and System Science 54 (1997)

    Google Scholar 

  47. Macintyre, A., Sontag, E.: Finiteness results for sigmoidal neural networks. In: Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pp. 325–334. Association of Computing Machinery, New York (1993)

    Google Scholar 

  48. Steele, J.: Existence of submatrices with all possible columns. Journal of Combinatorial Theory, Series A 28, 84–88 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  49. Wenocur, R., Dudley, R.: Some special Vapnik-Chervonenkis classes. Discrete Mathematics 33, 313–318 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  50. McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–248. Springer, New York (1998)

    Chapter  Google Scholar 

  51. Ahlswede, R., Gács, P., Körner, J.: Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34, 157–177 (1976) (correction in 39:353–354,1977)

    Article  MathSciNet  MATH  Google Scholar 

  52. Marton, K.: A simple proof of the blowing-up lemma. IEEE Transactions on Information Theory 32, 445–446 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  53. Marton, K.: Bounding \(\bar{d}\)-distance by informational divergence: a way to prove measure concentration. Annals of Probability 24, 857–866 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  54. Marton, K.: A measure concentration inequality for contracting Markov chains. Geometric and Functional Analysis 6, 556–571 (1996); Erratum 7, 609–613 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  55. Dembo, A.: Information inequalities and concentration of measure. Annals of Probability 25, 927–939 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  56. Massart, P.: Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86 (1998)

    Google Scholar 

  57. Rio, E.: Inégalités de concentration pour les processus empiriques de classes de parties. Probability Theory and Related Fields 119, 163–175 (2001)

    Article  MathSciNet  Google Scholar 

  58. Talagrand, M.: A new look at independence. Annals of Probability 24, 1–34 (1996) (Special Invited Paper)

    Article  MathSciNet  MATH  Google Scholar 

  59. Talagrand, M.: Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’I.H.E.S. 81, 73–205 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  60. Talagrand, M.: New concentration inequalities in product spaces. Inventiones Mathematicae 126, 505–563 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  61. Luczak, M.J., McDiarmid, C.: Concentration for locally acting permutations. Discrete Mathematics (to appear, 2003)

    Google Scholar 

  62. McDiarmid, C.: Concentration for independent permutations. Combinatorics, Probability, and Computing 2, 163–178 (2002)

    MathSciNet  MATH  Google Scholar 

  63. Panchenko, D.: A note on Talagrand’s concentration inequality. Electronic Communications in Probability 6 (2001)

    Google Scholar 

  64. Panchenko, D.: Some extensions of an inequality of Vapnik and Chervonenkis. Electronic Communications in Probability 7 (2002)

    Google Scholar 

  65. Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Annals of Probability (to appear, 2003)

    Google Scholar 

  66. Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM: Probability and Statistics 1, 63–87 (1997), http://www.emath.fr/ps/

    Article  MathSciNet  MATH  Google Scholar 

  67. Ledoux, M.: Isoperimetry and Gaussian analysis. In: Bernard, P. (ed.) Lectures on Probability Theory and Statistics, Ecole d’Eté de Probabilités de St-Flour XXIV-1994, pp. 165–294 (1996)

    Google Scholar 

  68. Bobkov, S., Ledoux, M.: Poincaré’s inequalities and Talagrands’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields 107, 383–400 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  69. Massart, P.: About the constants in Talagrand’s concentration inequalities for empirical processes. Annals of Probability 28, 863–884 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  70. Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications. Random Structures and Algorithms 16, 277–292 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  71. Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. The Annals of Probability 31, 1583–1614 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  72. Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.: Moment inequalities for functions of independent random variables. The Annals of Probability (to appear, 2004)

    Google Scholar 

  73. Bousquet, O.: A Bennett Concentration Inequality and Its Application to Suprema of Empirical Processes. C. R. Acad. Sci. Paris 334, 495–500 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  74. Giné, E., Zinn, J.: Some limit theorems for empirical processes. Annals of Probability 12, 929–989 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  75. Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 1902–1914 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  76. Bartlett, P., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48, 85–113 (2001)

    Article  MATH  Google Scholar 

  77. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)

    Google Scholar 

  78. Koltchinskii, V., Panchenko, D.: Rademacher processes and bounding the risk of function learning. In: Giné, E., Mason, D., Wellner, J. (eds.) High Dimensional Probability II, pp. 443–459 (2000)

    Google Scholar 

  79. Bartlett, P., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)

    MathSciNet  MATH  Google Scholar 

  80. Bartlett, P.L., Bousquet, O., Mendelson, S.: Localized rademacher complexities. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS, vol. 2375, pp. 44–48. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  81. Bousquet, O., Koltchinskii, V., Panchenko, D.: Some Local Measures of Complexity of Convex Hulls and Generalization Bounds. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS, vol. 2375, pp. 59–73. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  82. Antos, A., Kégl, B., Linder, T., Lugosi, G.: Data-dependent margin-based generalization bounds for classification. Journal of Machine Learning Research 3, 73–98 (2002)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bousquet, O., Boucheron, S., Lugosi, G. (2004). Introduction to Statistical Learning Theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science(), vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28650-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23122-6

  • Online ISBN: 978-3-540-28650-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics