The Hidden Challenge of Enterprise AI: Integrating with Large Codebases

Enterprises everywhere are racing to experiment with AI. The hype is undeniable. Let’s face it – prototyping something in under a few hours wasn’t even possible alone, let alone with a crew of engineers. Now – it’s a one prompt job. Which we don’t recommend by the way, check out our prompting framework.

Why Large Codebases Are a CodeBlock

Besides the innuendo – The reality is that unless you have a proper system architecture and a living documentation, you’re probably going to struggle integrating or iterate anything new into your enterprise with AI. If your codebase is 5GB worth of raw data – forget about it, it would take days for AI to read, understand, heck it won’t even remember your codebase.

Yes, for fun we researched theoretically how long it would take to read a 7GB codebase:

Even under optimistic assumptions and very fast parallel processing, reading 7 GB of raw source as a single LLM job is measured in many days (weeks) — typically weeks to years depending on assumptions. A practical single-worker Copilot-like session cannot do this in minutes or hours.

Don’t ask us why we chose 7GB, it’s just oddly specific. So in conclusion, don’t expect to do that before your birthday comes around next year.

Can’t I just Prompt Harder?

No.

Well What Can I Do?

Fine. We’ll tell you the secret.

The truth is, systems have to be designed the ground up for AI, not with AI. Reality hurts, but there are ways… of course we have a solution for you! and yes, it’s going to be painful.

  1. Auditing
    • You’re going to map where AI can add value, high level of what it can do as you don’t want AI to touch every part of your code. This is extremely important. Find the business value of AI first, before doing anything.
  2. Clear Use Case
    • Tie use cases to business KPIs, do not try to do everything.
  3. Modularise and Clean the Codebase
    • Yes you need to refactor if you haven’t got manageable modules. A system design approach would work best here if it was done in the beginning of the enterprise project, but it’s never too late to refactor.
      • The idea is to clean up only what you need – APIs, services, etc…
  4. Living Documentation
    • Time to adopt a living documentation approach. This is both for your AI and humans. How many times have you opened up an enterprise codebase and tried to get an idea of the structure by literally looking at the structure? This is not good practice.
  5. Build an AI Data Layer
    • Centralise access to knowledge: code embeddings, vector databases, structured logs.
    • Normalise data formats so AI can read them.
    • Consider governance – what AI should and should not access.
  6. Introduce Prompt & Workflow Standards
    • You need documentation on prompting, and a framework. We’ll be making one purely for enterprise. You need this for consistent results.
    • Establish a prompt log policy – tracking prompts and version control on prompts.
    • Train teams on prompt engineering best practice.
  7. Embedding AI into the Development Workflow
    • Integration into CI/CD pipelines
    • Implement AI code review copilots along side human reviewers
  8. Security, Compliance and Governance
    • Defining rules for what AI can access
    • Set up monitoring for hallucinations, insecure code or compliance risks
    • Always include humans!
  9. Continuous Learning and Feedback Loops
    • Capture feedback on AI outputs – success/failures
    • Treat AI like a junior engineer that learns.

It’s that simple!

Tips and Tricks

These one’s are on us and probably deserves a subscribe or share.

  • Create a hierarchy (links within links) of living documentation so AI can easily read through it without reading your codebase. You can even get AI to update these documentations if you’ve taken the above steps.
  • Consider API first approach for your enterprise solution, that way you can build upon it easily due to abstraction of complexity.

If you have any questions / concerns with our approaches or advice, feel free to contact us! We’re very open to criticism and feedback.

If you liked the read, please share:

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *