Google and IBM have joined together on an initiative to promote research on “cloud computing,” a technology that uses remote servers, instead of users’ PCs, to run programs and services. The two technology giants said they each will contribute up to $25 million to build data centers that can be used by university researchers and students in the United States.
“We’re aiming to train tomorrow’s programmers to write software that can support a tidal wave of global web growth and trillions of secure transactions every day,” said Sam Palmisano, IBM’s chief executive officer.
The initiative’s goal is to improve computer-science students’ knowledge of highly parallel computing practices to better address the emerging paradigm of large- and internet-scale distributed computing.
“To most effectively serve the long-term interests of our users, it is imperative that students are adequately equipped to harness the potential of modern computing systems and for researchers to be able to innovate ways to address emerging problems,” said Eric Schmidt, Google’s chief executive.
The two companies have dedicated a large cluster of several hundred computers (a combination of Google machines and IBM BladeCenter and System x servers) that is planned to grow to more than 1,600 processors. Students will access the cluster via the internet to test their parallel-programming course projects.
These clusters support what the companies call “cloud computing.” A cloud is made up of machines that host a variety of applications that users access remotely via the internet. Cloud computing supports a wider range of applications, as they can be hosted virtually and distributed across the cluster of computers-the “cloud.”
The servers will run open-source software, including the Linux operating system, XEN systems virtualization, and Apache’s Hadoop project, an open-source implementation of Google’s published computing infrastructure.
This form of parallel computing is relatively new and has not fully caught on in universities, said an IBM spokeswoman.
The University of Washington was the first to join the initiative. A small number of universities will pilot the program, including Carnegie Mellon, Stanford, Maryland, Cal-Berkeley, and the Massachusetts Institute of Technology. In the future, the program will be expanded to include additional researchers, educators, and scientists.
Fundamental changes in computing architecture and increases in network capacity are encouraging software developers to take new approaches to computer-science problem solving, organizers of the initiative say. For web-based software that controls functions such as search, social networking, and mobile commerce to run quickly, computational tasks often need to be broken into hundreds or thousands of smaller pieces to run across many servers simultaneously. Parallel programming techniques also are used for complex scientific analysis such as gene sequencing and climate modeling.
At the University of Washington, students have been able to harness the power of distributed computing to produce complicated programs, such as software that scans voluminous Wikipedia edits to identify spam and organizes global news articles by geographic location.
“In 2006, our goal was to understand the challenges that universities face in teaching … large-scale computing and develop methods to address this issue,” said Ed Lazowska, Bill and Melinda Gates Chair of Computer Science and Engineering at the University of Washington. “A year later, we’ve seen how our students have mastered many of the techniques that are critical for large scale-internet computing, benefiting our department and students.”
“Carnegie Mellon applauds Google and IBM for helping to provide the resources that will help professors better prepare our students for the challenges presented by highly parallel computing,” said Randal Bryant, dean of the School of Computer Science at Carnegie Mellon University. “We are quite pleased to be among the first universities participating in this program this fall.”
To simplify the development of massively parallel programs, Google and IBM have created the following resources for universities:
- A cluster of processors running an open-source implementation of Google’s published computing infrastructure (MapReduce and GFS from Apache’s Hadoop project);
- A Creative Commons licensed university curriculum developed by Google and the University of Washington focusing on massively parallel computing techniques;
- Open-source software designed by IBM to help students develop programs for clusters running Hadoop. The software works with Eclipse, an open-source development platform;
- Management, monitoring, and dynamic resource provisioning of the cluster by IBM, using IBM Tivoli systems management software; and
- A web site to encourage collaboration among universities in the program, which will be built on Web 2.0 technologies from IBM’s Innovation Factory.
Creative Commons curriculum
Open source Hadoop plugin
eSN: Six Key Trends to Watch (No. 2 Cloud Computing)