Step-by-Step Guide to Build a Competition Analysis AI

Sarthak Arora
4 min readApr 2, 2024

Traditional competition analysis, while valuable, has limitations. A business might consider a competition analysis AI tool for speed and efficiency, for real-time insights, for making better data-driven decisions, or for an AI’s predictive capabilities.

In this blog, I’ll walk you through the building of a Competition Analysis AI using a powerful combination of tools: Mistral 7B, Python, Qdrant, and LangChain.

Image generated using AI, about an AI

Laying the Groundwork

To begin our journey, we start with the backbone of any good analysis: data. Using Python, I scripted a process that compiles a comprehensive list of competitor websites and their URLs, ensuring a broad and representative sample of the competitive space and downloading necessary libraries for the same.

Python Script for Importing Necessary Libraries and Data Collection

from dotenv import load_dotenv
load_dotenv()

import os
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import Qdrant
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.faiss import FAISS
from langchain.chains import create_retrieval_chain

# Conversation imports
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain.chains.history_aware_retriever import create_history_aware_retriever
import pandas as pd
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


# Reading Webpages of our competitors
def get_documents_from_web(df):
lst = []
for i in range(len(df)):
url = df["URL's"][i]
loader = WebBaseLoader(url)
docs = loader.load()
lst.append(docs)


splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=20
)
splitDocs = splitter.split_documents(docs)
return splitDocs

df = pd.read_csv('Others.csv')
docs = get_documents_from_web(df)

The Tech Marvels: Qdrant and LangChain

Next, it’s onto the powerhouse duo — Qdrant and LangChain. With Python at the helm, these tools work in tandem to sift through the compiled URLs. They index the content into Qdrant, a potent vector database, readying it for sophisticated natural language processing. LangChain’s linguistic prowess is invaluable here, transforming the raw website content into structured, analyzable data.

Indexing with Qdrant and LangChain

def create_db(docs):
model_name = "BAAI/bge-large-en"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceBgeEmbeddings(model_name=model_name,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs)


url = "http://localhost:6333"
vectorStore = Qdrant.from_documents(docs,embedding= embeddings,url=url,prefer_grpc=False,collection_name="vector_db")
print("Vector DB Created successfully")
return vectorStore


vectorStore = create_db(docs)

Building the RAG Pipeline

The heart of our AI is the RAG (Retrieval Augmented Generation) pipeline, fueled by Mistral 7B. This pipeline is our query processing unit, designed to handle complex questions like “What does XYZ company do?” It retrieves the most relevant information, shaped by the contextual understanding courtesy LangChain and Qdrant’s indexing prowess.

I have used LLM Studio here.

LM Studio is a desktop application designed to make experimenting with large language models (LLMs) accessible. LM Studio caters to users who want to run LLMs directly on their own computers, rather than relying on cloud-based services. This can be advantageous for privacy reasons or for working with open-source models that may not be available as cloud services.

LM Studio also simplifies the process of downloading, running, and interacting with LLMs. It provides a graphical user interface (UI) that doesn’t require extensive coding knowledge. LM Studio works on both Windows and Mac operating systems, making it accessible to a wider range of users.

def create_chain(vectorStore):
os.environ["OPENAI_API_KEY"] = "not-needed"
model = ChatOpenAI(base_url="http://localhost:1234/v1")

#LM Studio was used here, for downloading mestral-7B and
#for connectivity here!, it's a go to app for playing with open source LLM

prompt = ChatPromptTemplate.from_messages([
("system", "Answer the user's questions based on the context: {context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}")
])


# chain = prompt | model
chain = create_stuff_documents_chain(
llm=model,
prompt=prompt
)


# Replace retriever with history aware retriever
retriever = vectorStore.as_retriever(search_kwargs={"k": 3})


retriever_prompt = ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])
history_aware_retriever = create_history_aware_retriever(
llm=model,
retriever=retriever,
prompt=retriever_prompt
)


retrieval_chain = create_retrieval_chain(
# retriever, Replace with History Aware Retriever
history_aware_retriever,
chain
)


return retrieval_chain




def process_chat(chain, question, chat_history):
response = chain.invoke({
"chat_history": chat_history,
"input": question,
})
return response["answer"]


chain = create_chain(vectorStore)


# Initialize chat history
chat_history = []

while True:
user_input = input("You: ")
if user_input.lower() == 'exit':
break
response = process_chat(chain, user_input, chat_history)
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=response))
print("Assistant:", response)

Gradio UI: Your Analysis, Simplified

No AI is complete without an interface. Here’s a peek at how we used Gradio to create a user-friendly front-end:

import gradio as gr


chat_history = []


# Define a function that takes input and returns output
def greet(query):
global chat_history # Ensure chat_history is accessible and modified globally


user_input = query
response = process_chat(chain, user_input, chat_history)


# Update chat history with the current interaction before returning the response
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=response))


return response


# Create an interface
iface = gr.Interface(
fn=greet,
inputs="text",
outputs="text",
title="Company Info Application",
description="Write your queries about the company"
)


# Launch the interface
iface.launch()

Output:

Conclusion

This Competition Analysis AI stands as a testament to the transformative power of AI in business strategy.

It encapsulates the ability to turn vast arrays of data into actionable insights, offering businesses a competitive edge in understanding their market position.

If you’re here, do give me a follow on Medium and connect with me on Linkedin to chat more about Data Science!

--

--

Sarthak Arora

Data Scientist @ Jupiter.co | Ex - Assistant Manager in Analytics @ Paisabazaar | I write about Data Science and ML | https://www.linkedin.com/in/iasarthak/