Code generation systems such as DeepMind’s AlphaCode, Amazon’s CodeWhisperer, and OpenAI’s Codex, which powers GitHub’s Copilot service, provide a fascinating insight into what is possible with AI today in the world of computer programming. But so far, only a file a bunch Some of these AI systems are freely available to the public and open source – reflecting the business incentives for the companies that build them.
In an effort to change that, Hugging Face and ServiceNow Research, ServiceNow’s R&D division, today launched BigCode, a new project that aims to develop “modern” AI systems for coding in an “open and responsible” manner. The goal is to release a data set large enough to train a code generation system, which will then be used to build a prototype – a model with 15 billion parameters, larger than Codex (12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) – using ServiceNow’s range of internal graphics cards. In machine learning, parameters are parts of an artificial intelligence system that have been learned from historical training data and essentially determine the skill of the system in a problem, such as code generation.
Inspired by Hugging Face’s BigScience In an effort to open up highly sophisticated text generation systems, BigCode will be open to anyone with a professional research background in artificial intelligence who can dedicate time to the project, organizers say. Application form Went straight in the afternoon.
In general, we expect applicants to be affiliated with a research organization (either in academia or industry) and work on the technical/ethical/legal aspects [large language models] for coding applications,” ServiceNow wrote in A Blog post. “Once in [code-generating system] He has been trained, we will evaluate his abilities… We will strive to make the evaluation easier and broader so that we can learn more about [system’s] Capabilities.”
By collaboratively developing its code generation system, which will be open source under a license that allows developers to reuse it subject to certain terms and conditions, BigCode seeks to address some of the controversies that have arisen around the practice of AI-generating supported code — particularly with regard to fair use. Nonprofit Software Freedom Organization among others Criticize GitHub and OpenAI to use public source code, not all of which are subject to an authorized license, to train and monetize Codex. Codex is available through a paid OpenAI API, while GitHub recently started charging for access to Copilot. For its parts, GitHub and OpenAI continue to stress that Codex and Copilot do not conflict with any license terms.
The BigCode organizers say they will make an effort to ensure that only files from repositories with permitted licenses enter the above training dataset. Along the way, they say, they will work to create “responsible” AI practices to train and share code generation systems of all kinds, and solicit feedback from relevant stakeholders before making policy statements.
ServiceNow and Hugging Face have not provided any timetable as to when the project will be completed. But they expect it to explore many forms of code generation over the next few months, including systems that automatically complete and synthesize code from snippets of code and natural language descriptions and work across a wide range of domains, tasks, and programming languages.
Assuming that the ethical, technical, and legal issues are resolved one day, AI-powered coding tools can drastically lower development costs while allowing programmers to focus on more creative tasks. according to study From the University of Cambridge, at least half of developers’ efforts are spent on debugging rather than active programming, costing the software industry an estimated $312 billion annually.