Home    Industry News    How to build artificial intelligence laboratory? Sharing of practical experience in University Construction

How to build artificial intelligence laboratory? Sharing of practical experience in University Construction

Hits: 3894249 2020-04-18

In order to meet the urgent needs of artificial intelligence industry for talents, the state has issued a number of policies and guidelines, guided colleges and universities to set up artificial intelligence related majors as soon as possible, and increased the cultivation of artificial intelligence talents. Nanjing University of science and technology actively responded to the national policy and set up a new AI major in the school of computer science. In 2019, it plans to recruit 60-100 people. At the same time, the school plans to complete the construction of artificial intelligence laboratory platform before the start of school in autumn 2019, to meet the needs of teachers in curriculum design, experiments and other aspects, and to improve the class efficiency and learning effect of students majoring in artificial intelligence.
Construction planning of Artificial Intelligence Laboratory
For the construction of Artificial Intelligence Laboratory, the school has clear planning and clear objectives: it is necessary to launch the program at the end of May, so that some students can use it first; in September, it will launch the overall platform to meet the needs of efficient management of operation and maintenance personnel. The school built the Artificial Intelligence Laboratory for the first time, and made detailed planning and clear requirements for the platform from various perspectives:
1. Operation and maintenance management
Support user GPU quota management
Resource template distribution, multiple configurations and one click deployment
Support ad domain linkage (LDAP)
Support GPU virtual machine life cycle management
Realize "zero operation and maintenance": scientific research teachers do not need to care about infrastructure operation and maintenance after applying for computing, storage, network, GPU and other resources
2. reliability
It needs 7 * 24 hours of uninterrupted use
Scale: 150 concurrent Online
3. compatibility
The existing servers and video cards in the old part (about 19 servers can be used in the old part)
Support different types of video cards: P40, K80, Titan V, RTX 2080 Ti
Difficulties in the construction of Artificial Intelligence Laboratory
Based on the school's existing IT architecture, the construction of the school's artificial intelligence laboratory platform faces the following problems:
1. Complex operation and maintenance management
The school adopts the structure of traditional server + external storage, with numerous devices and fragmented logs, which makes the operation and maintenance management difficult. Teachers need to spend a lot of time and energy on the operation and maintenance work.
2. Inflexible resource allocation
Students from different research directions have different requirements for resource use. Some students may only need half of the computing performance of single card GPU, but the complex experimental environment leads to the lack of flexibility in resource allocation and waste of resources.
3. Unable to realize multi tenant management
Scientific research teachers want to use root authority to manage computing and storage resources, and operation and maintenance teachers cannot provide corresponding root authority due to compliance requirements, so multi tenant management is needed to meet the needs of different teachers.
4. Use old servers to reduce costs
The school has 17 original servers. If a new experimental platform is built, the cost will be too high. Therefore, the school hopes to use the old servers to save costs.
Shenxin super fusion function advantages
The underlying scientific research cloud platform based on suncon super integration is simple in structure, convenient for centralized management and easy to maintain, and super integration can reasonably plan and allocate resources according to the actual information needs of the school. In order to meet the needs of the school's enrollment, in addition to the original 19 servers of the old school, the plan also plans to purchase 31 new servers to meet the needs of future research and teaching.
1. Minimal architecture, fast online
Shenxin super integrated machine can replace a variety of devices in the traditional architecture, integrate computing, storage, network and other resources, realize standardized delivery and template deployment, and ensure the school AI laboratory platform can be put online quickly.
2. Matching requirements, high performance
Shenxin super fusion supports different types of video cards: 48 * P40, 12 * K80, 53 * Titan V, 8 * RTX 2080 ti. It integrates many video card resources. The floating-point computing capacity of the platform reaches 5600tflops, the number of GPU cores reaches 1740800 stream processors, and the video memory reaches 4400gb, meeting the needs of high-performance computing.
3. Unified transportation management and flexible configuration
Through the ACMP cloud management platform, the business can be comprehensively managed, configured, scheduled and rehearsed; the platform supports user GPU quota management; resource template distribution, multiple configurations and one click deployment, greatly simplifying management, operation and maintenance.
4. Stable and reliable, flexible expansion
The super fusion ha mechanism ensures the uninterrupted operation of the business and the stable operation of the scientific research platform in 24 hours; it uses the Vlad, DRX and DRS to meet the concurrent use demand of 150 people; and the super fusion architecture has good scalability and can add or replace hardware resources according to the business demand at any time to ensure the safe and stable operation of the platform.
The value of Artificial Intelligence Laboratory Platform Based on suncon super fusion
The Artificial Intelligence Laboratory Platform Based on suncon super fusion architecture has the following value characteristics:
1. Efficient management of student account
The platform can import users in batches in combination with AD domain, manage permissions, and automatically synchronize new users.
2. Support the application of GPU resource quota
School users can independently apply for GPU virtual machine (administrators can define virtual machine templates).
3. Flexible resource scheduling and allocation
It supports virtual machine life cycle management, and automatically performs shutdown operation after the virtual machine expires (virtual machine does not delete, release GPU resources). After shutdown, GPU graphics card can be used by other users. Users can re apply for GPU resources and start up to enter the next life cycle, which does not need the approval of the administrator.
4. Close customization, saving resources
According to the resource use scenario of Nanjing University of technology, we develop matching algorithm to distribute the same template virtual machine on the same host as much as possible, so as to avoid the waste of resources caused by the difficulty of remaining resources of host to meet the demand of high configuration virtual machine.
Colleges and universities have made great contributions to the cultivation of national AI high-end talents and the promotion of the continuous innovation ability of AI in China. In the future, we will continue to provide innovative products and solutions for teachers and students through continuous information technology innovation, help colleges and universities to rapidly build laboratories, and actively promote academic research and talent training in the field of artificial intelligence.
Nanjing University of Technology
Nanjing University of science and technology is a national key university under the Ministry of industry and information technology. It is located in the ancient capital of Nanjing. Founded in 1953, the Military Engineering College of the people's Liberation Army, the highest institution of military science and technology of the people's Republic of China, has experienced the development stages of Artillery Engineering College, East China Engineering College, East China Institute of technology, etc. it was renamed Nanjing University of technology in 1993. In 1995, the University became one of the first batch of national "211 Project" key construction universities; in 2000, it was approved to establish graduate school; in 2011, it was approved to build "985 project advantage discipline innovation platform"; In 2017, the University was selected as the "double first-class" construction university, and the "weapon science and technology" discipline was selected as the "double first-class" construction discipline; in December 2018, the University became a joint University of the Ministry of industry and information technology, the Ministry of education and Jiangsu Province. The school adheres to the school running concept of "people-oriented, moral and erudite", adheres to the school motto of "pursuing morality and learning, and striving for innovation", carries forward the school ethos of "unity, dedication, truth-seeking, and innovation", takes serving the national strategic needs and promoting social progress as its mission, and is committed to building a high-level research university with domestic first-class and international famous characteristics.

Online QQ Service, Click here

QQ Service

Wechat Service