How RAG (Retrieval Augmented Generation) Keeps Your Real Estate Data Secure with AI

Artificial Intelligence (AI) is reshaping how we work with large datasets in real estate—whether it’s financial models, rent rolls, valuations, or lease contracts. But as useful as tools like ChatGPT and Claude are, there’s a big challenge: how do we safely use AI with confidential or sensitive data?

This question was at the heart of a recent webinar presented by Jason, who walked us through the strengths, weaknesses, and practical applications of Retrieval Augmented Generation (RAG) for real estate professionals.

The Problem: AI Meets Confidential Data

AI models, or LLMs (Large Language Models), are powerful for tasks such as:

Summarizing reports and contracts
Extracting clauses (e.g., rent escalations in leases)
Drafting investment memos and due diligence notes
Answering “what-if” scenario questions

But the catch? Many popular AI platforms are public, meaning any sensitive information fed into them risks being stored or leaked. For industries like real estate finance, where compliance and confidentiality are critical, that’s not an acceptable risk.

Three Approaches to Using AI Securely

Jason outlined three main ways organizations can use AI while managing security risks:

1. Public LLMs (ChatGPT, Claude, etc.)

The simplest approach: upload your data, ask questions, and get quick answers. Great for non-confidential tasks, but not suitable for sensitive client information.

2. Hybrid RAG Approach

Here, you use an AI model via its API but augment it with your own dataset stored in a secure vector database. Instead of uploading all your documents, you only feed in the relevant pieces of data at query time.

Example: Extracting escalation clauses from a local lease database without uploading the whole dataset to ChatGPT.
Advantage: More control and confidentiality.

3. Running Your Own LLM Locally

For maximum security, companies can host their own AI models on local servers or private cloud infrastructure. Tools like Ollama allow organizations to download and run models themselves, combining them with RAG for context-aware answers.

Advantage: Complete control, no reliance on third-party providers.
Trade-off: Performance and cost. Running a high-quality LLM locally requires significant computing power and can be expensive compared to using cloud-based services.

What Exactly is RAG?

RAG (Retrieval Augmented Generation) is a technique where the AI model is “augmented” with your proprietary data before generating a response.

Here’s how it works:

Indexing: Break confidential data (leases, valuations, reports) into chunks and convert them into numerical “embeddings.”
Storage: Store those embeddings in a vector database—a lightweight, highly optimized format.
Querying: When you ask a question, the system retrieves only the most relevant pieces of your data.
Prompt Construction: The retrieved context is combined with your question and sent to the LLM.
Response: The AI generates an answer grounded in both its general training and your specific data.

This means you don’t have to expose entire datasets to the model—just the parts needed to answer the query.

Local vs Cloud: Performance Trade-Offs

Jason demonstrated two RAG setups:

Using OpenAI’s API: Fast (2–4 seconds per query), cloud-based, relatively cheap ($20/month).
Using Ollama Locally: Ultra-secure, but slower (20+ seconds per query) and requires strong hardware. Hosting at scale could cost thousands of pounds monthly in cloud compute.

So the choice depends on your needs:

Speed & affordability? Go hybrid with OpenAI/Claude APIs.
Maximum confidentiality? Consider running your own local LLM.

Key Security Considerations

Before choosing an approach, organizations must evaluate:

Data leakage risks: Even chunks of documents may contain sensitive details—anonymization can help.
Trust in providers: API providers claim they don’t use your augmented data for training—but policies can change.
Compliance rules: Highly regulated industries must consider legal restrictions on sending data outside secure environments.

Final Thoughts

RAG isn’t a magic bullet, but it’s a practical framework for safely combining the power of AI with the confidentiality demands of real estate finance.

For everyday use: RAG with ChatGPT or Claude APIs works well if data is anonymized.
For sensitive data: Running your own LLM with RAG is the most secure option, but comes at a performance and cost trade-off.

As Jason concluded:
👉 Start small. Pilot with limited datasets.
👉 Refine your prompts.
👉 Scale once you’ve proven the approach.

AI is not about replacing expertise—it’s about amplifying human decision-making while protecting what matters most: your data.