skip to main content
research-article
Open Access

ImageNet classification with deep convolutional neural networks

Published:24 May 2017Publication History
Skip Abstract Section

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

References

  1. Bell, R., Koren, Y. Lessons from the netflix prize challenge. ACM SIGKDD Explor. Newsl. 9, 2 (2007), 75--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berg, A., Deng, J., Fei-Fei, L. Large scale visual recognition challenge 2010. www.image-net.org/challenges. 2010.Google ScholarGoogle Scholar
  3. Breiman, L. Random forests. Mach. Learn. 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cireşan, D., Meier, U., Masci, J., Gambardella, L., Schmidhuber, J. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.Google ScholarGoogle Scholar
  5. Cireşan, D., Meier, U., Schmidhuber, J. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.Google ScholarGoogle Scholar
  6. Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L. In ILSVRC-2012 (2012).Google ScholarGoogle Scholar
  7. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR09 (2009).Google ScholarGoogle Scholar
  8. Fei-Fei, L., Fergus, R., Perona, P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vision Image Understanding 106, 1 (2007), 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 4 (1980), 193--202.Google ScholarGoogle ScholarCross RefCross Ref
  10. Griffin, G., Holub, A., Perona, P. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.Google ScholarGoogle Scholar
  11. He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google ScholarGoogle Scholar
  12. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google ScholarGoogle Scholar
  13. Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., LeCun, Y. What is the best multi-stage architecture for object recognition? In International Conference on Computer Vision (2009). IEEE, 2146--2153.Google ScholarGoogle ScholarCross RefCross Ref
  14. Krizhevsky, A. Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009.Google ScholarGoogle Scholar
  15. Krizhevsky, A. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 2010.Google ScholarGoogle Scholar
  16. Krizhevsky, A., Hinton, G. Using very deep autoencoders for content-based image retrieval. In ESANN (2011).Google ScholarGoogle Scholar
  17. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems (1990). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. LeCun, Y. Une procedure d'apprentissage pour reseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks). 1985.Google ScholarGoogle Scholar
  19. LeCun, Y., Huang, F., Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, CVPR 2004. Volume 2 (2004). IEEE, II--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. LeCun, Y., Kavukcuoglu, K., Farabet, C. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS) (2010). IEEE, 253--256.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lee, H., Grosse, R., Ranganath, R., Ng, A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (2009). ACM, 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT Numer. Math. 16, 2 (1976), 146--160.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mensink, T., Verbeek, J., Perronnin, F., Csurka, G. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV -- European Conference on Computer Vision (Florence, Italy, Oct. 2012).Google ScholarGoogle Scholar
  24. Nair, V., Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pinto, N., Cox, D., DiCarlo, J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, 1 (2008), e27.Google ScholarGoogle ScholarCross RefCross Ref
  26. Pinto, N., Doukhan, D., DiCarlo, J., Cox, D. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5, 11 (2009), e1000579.Google ScholarGoogle ScholarCross RefCross Ref
  27. Rumelhart, D.E., Hinton, G.E., Williams, R.J. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.Google ScholarGoogle Scholar
  28. Russell, BC, Torralba, A., Murphy, K., Freeman, W. Labelme: A database and web-based tool for image annotation. Int. J. Comput Vis. 77, 1 (2008), 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sánchez, J., Perronnin, F. High-dimensional signature compression for large-scale image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011 (2011). IEEE, 1665--1672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Simard, P., Steinkraus, D., Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition. Volume 2 (2003), 958--962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  32. Turaga, S., Murray, J., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput. 22, 2 (2010), 511--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Werbos, P. Beyond regression: New tools for prediction and analysis in the behavioral sciences, 1974.Google ScholarGoogle Scholar

Index Terms

  1. ImageNet classification with deep convolutional neural networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 60, Issue 6
          June 2017
          93 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/3098997
          • Editor:
          • Moshe Y. Vardi
          Issue’s Table of Contents

          Copyright © 2017 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 May 2017

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format