GPT-4 plus Agent easily catches up with Devin! Developed by Princeton University, it received 1.6k stars on its first day of open source.

Latest update time：2024-04-03

Reads：

Bai Jiao from Aofei Temple
Quantum Bit | Public Account QbitAI

The AI programmer built with GPT-4 easily caught up with Devin!

The open-source SWE-agent built by Princeton can be used right out of the box to fix real bugs in GitHub repositories.

On the 25% SWE-bench test set, it achieves similar accuracy to Devin — solving 12.29% of the problems.

On its first day on GitHub, it garnered 1.6K stars. Many netizens lamented that simply redesigning the GPT-4 command-line tool could significantly improve some of GPT-4's capabilities.

This also confirms the point made by Andrew Ng in his speech a few days ago:

I believe AI agents will drive huge advances in AI this year, perhaps even surpassing the next generation of foundational models.

Because in Andrew Ng's research, the GPT-3.5 agent performed better than GPT-4.

Now that the open source version of Devin is available, some people have started to RIP Devin.

Let's take a look at what this open source version of Devin looks like.

Open source version of Devin is here

Simply put, SWE-agent transforms a language model (such as GPT-4) into a software engineering agent to debug some issues in the GitHub repository.

They designed an LM-centric command and feedback format to achieve these results, making it easier for language models to navigate repositories and view, edit, and execute code files.

In their parlance, this is called the Agent Computer Interface (ACI) , and they built a SWE-agent repository to allow for easy iteration of ACI designs.

On SWE-agent, the team mainly designed these functions.

Create a file viewer and editor that can open, scroll, and edit files.

It was found that the best effect was achieved when only 100 rows were displayed per round.

Edit specific lines with automatic syntax checking.

If the code syntax is incorrect, the editing command will not be allowed to pass. If the command output is empty, a message will be returned: "Your command ran successfully, but no output was produced."

Writing and executing tests.

Currently, there are two main steps to use SWE-agent.

First, reasoning. The SWE-agent receives an input GitHub issue and returns a pull request that attempts to fix that issue.

The second step is to evaluate the pull request to verify that it actually fixes the issue. * (Currently only applies to issues in the SWE-bench benchmark).

Nvidia scientist Jim Fan praised his work: "Great Work!" He revealed the simple essence behind the hype.

Simply by better hand-crafting the GPT-4 command line tool, you can get a score of 12.3 on SWEBench. Nothing magical, no model breakthrough, and no reason to hype it up.

When GPT-5 arrives, none of this “Prompt Engineering 2.0” will matter anymore.

This inevitably reminds people of Devin's sudden appearance some time ago, and now few people discuss it.

However, some people asked why they didn’t use Claude 3 as the agent. The creative team said: We tried it, but the result was not very good.

On SWE-bench Lite (a 10% subset of the test set) , it scored nearly 6% less than GPT-4. It was also much slower (GPT-4's response time was 93 seconds).

In addition, he also said that the team spent a lot of effort on the logo design——

Spent a few hours designing it using DALL-3. (Doge)

Made in Princeton

This is a software engineering agent created by the Princeton NLP group.

It is understood that John Yang and Carlos E. Jimenez are co-authors.

In addition, there is Yao Shunyu, who is currently a doctoral student at Princeton and graduated from Tsinghua University's Yao Class in 2015.

Their common mentor is Karthik Narasimhan, currently co-director of Princeton NLP and a colleague of Chen Danqi.

The team said they will publish a paper on April 10.

However, there is still a soul-searching question at the end:

Uh But GPT-4 isn't open source...

Well, what do you think of this open source AI programmer?

Reference links:
[1] https://swe-agent.com/
[2] https://twitter.com/DrJimFan/status/1775173542470111475

-over-

[Registration Now Open] China AIGC Industry Summit

Scheduled for April 17th

The summit has invited several guests representing technology, products, investment, users and other fields to discuss the latest transformation trends in the generative AI industry.

Newly confirmed guests include: SenseTime; Yang Fan , Qingsong Group ; Gao Yushi , Evernote ; Tang Yi , Ant Group; Li Jianguo; and more. Learn more about the guests .