
““In preparing to release Claude Capybara, we want to act with extra caution and understand the risks it poses—even beyond what we learn in our own testing. In particular, we want to understand the model’s potential near-term risks in the realm of cybersecurity—and share the results to help cyber defenders prepare,” the document said. Anthropic appears to be especially worried about the model’s cybersecurity implications, noting that the system is “currently far ahead of any other AI model in cyber capabilities,” and “it presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.” In other words, Anthropic is concerned that hackers could use the model to run large-scale cyberattacks. The company said in the draft blog that because of this risk, its plan for the model’s release would focus on cyber defenders: “We’re releasing it in early access to organizations, giving them a head start in improving the robustness of their codebases against the impending wave of AI-driven exploits.””