Revolutionizing Interviews: Real-Time AI Agents Powered by Gemini Live and Livekit

Imagine a world where AI-powered agents can conduct preliminary interviews, assessing candidates based on specific job requirements, all in a natural, conversational manner. This POC brings that vision closer to reality by integrating the RealTime Model Gemini Live API with the robust real-time communication capabilities of Livekit.

The Core Idea: AI-Driven Interviews in Real-Time

The central concept was to create an AI Voice Agent capable of conducting a structured yet conversational interview. Here’s how it works:

  1. User Input: The process begins with a user providing the candidate's name and the job description through a simple NextJS application.
  2. Livekit Connection: This information is then passed to a Python-based Livekit Agent. Livekit provides the real-time audio and video infrastructure necessary for the interaction.
  3. Gemini Live Power: The Livekit Agent leverages the RealTime Model Gemini Live API to power the AI interviewer. This allows for low-latency audio processing and generation, enabling a fluid, back-and-forth conversation.
  4. Intelligent Interviewing: The Gemini Live API is instructed with the candidate's name and the job description. The AI agent then greets the candidate, asks relevant questions based on the job requirements, and even follows up on their responses, maintaining a conversational flow.
  5. Time-Bound Interviews: To ensure efficiency, the interview duration is pre-set (in this POC, to 2 minutes for the main questioning, with a graceful wrap-up at 5 minutes total). The AI agent is programmed to politely conclude the interview as the time limit approaches.
  6. Seamless Integration: A NextJS application provides the user interface to initiate the interview and connect to the Livekit Agent.

The Technology Stack in Action

To bring this concept to life, I utilized the following key technologies:

  • RealTime Model Gemini Live API: This powerful API from Google's Gemini family enables real-time audio processing and generation, crucial for a natural-sounding voice agent.
  • Livekit: An open-source platform for building real-time audio and video applications. Livekit's agent framework simplifies the creation and management of AI-powered participants in a video/audio room.
  • NextJS: A React framework for building performant and scalable web applications, used here to create the user interface for inputting candidate details and connecting to the Livekit Agent.
  • Python: The chosen language for developing the Livekit Agent, leveraging Livekit's Python SDK and the Google Cloud client libraries.

A Glimpse into the Code

Here’s a snippet of the Python code showcasing how the Livekit Agent is configured with the Gemini Live API:


class Assistant(Agent):
    def __init__(self, user_name: str, job_description: str) -> None:
        instructions = f"""
        You are a professional job interviewer.
        The candidate's name is {user_name}.
        The job description is:

        {job_description}

        Begin the interview by greeting {user_name}, then ask one question at a time relevant to the job.
        Ask follow-up questions based on the candidate's answers. Keep it conversational and focused on evaluating fit for the role.
        You have 2 minutes for this interview. As time approaches, politely wrap up and thank the candidate.
        """
        super().__init__(instructions=instructions)


async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    # ... (code to get user name and job description from metadata) ...

    session = AgentSession(
        llm=google.beta.realtime.RealtimeModel(
            model="gemini-2.0-flash-exp",
            voice="Puck",
            temperature=0.8,
            instructions="You are a professional interviewer. Greet the candidate and begin the interview based on the job description.",
        ),
    )

    # ... (code to start the session, run the interview, and conclude) ...
    

And here's a snippet from the NextJS page demonstrating the user input and connection initiation:


"use client";

// ... (imports) ...

function SimpleVoiceAssistant(props: { onConnectButtonClicked: (name: string, job: string) => void }) {

 const { state: agentState } = useVoiceAssistant();
 const [userName, setUserName] = useState("");
 const [jobDescription, setJobDescription] = useState("");

 return (
  <>
  <AnimatePresence mode="wait">
   {agentState === "disconnected" ? (
    <motion.div key="disconnected" /* ... animation props ... */ className="grid items-center justify-center h-full">
     <div className="flex flex-col items-center space-y-4">
      <input
       type="text"
       value={userName}
       onChange={(e) => setUserName(e.target.value)}
       placeholder="Enter your name"
       className="px-4 py-2 rounded-md border border-gray-300 w-[300px]"
      />

      <textarea
       value={jobDescription}
       onChange={(e) => setJobDescription(e.target.value)}
       placeholder="Paste the job description here..."
       className="px-4 py-2 rounded-md border border-gray-300 w-[300px] h-[150px]"
      />

      <motion.button
       /* ... animation props ... */
       className="uppercase px-4 py-2 bg-white text-black rounded-md"
       onClick={() => props.onConnectButtonClicked(userName, jobDescription)}
      >
       Start a conversation
      </motion.button>
     </div>
    </motion.div>
   ) : (
    // ... (UI for the active interview) ...
   )}
  </AnimatePresence>
  </>
 );
}


// ... (other component definitions) ...
    

Key Learnings and Future Directions

This POC demonstrated the feasibility of using real-time AI to conduct initial job interviews. The Gemini Live API provided a surprisingly natural voice and the ability to understand and respond contextually based on the provided instructions. Livekit's agent framework streamlined the integration of this AI agent into a real-time communication environment.

Building upon this foundation, future explorations could focus on:

  • Enhanced Engagement with Avatars: Integrating the AI voice agent with visual avatars from services like ProviderPlugin, Beyond Presencebey, bitHumanbithuman, and Tavustavus to create a more immersive and engaging interview experience.
  • More Sophisticated Questioning: Implementing more complex interview strategies, including behavioral questions and deeper dives into candidate experience.
  • Multimodal Interaction: Incorporating video input and analysis to assess non-verbal cues.
  • Integration with HR Systems: Connecting the AI agent with existing applicant tracking systems (ATS) for seamless data flow.
  • Bias Detection and Mitigation: Implementing mechanisms to ensure fairness and reduce potential biases in the AI's questioning and evaluation.
  • Candidate Feedback: Providing candidates with feedback on their interview performance.
  • Customizable Interview Flows: Allowing users to define and customize interview questions and timelines.

Conclusion

This POC offers a compelling glimpse into the future of recruitment. By leveraging the power of real-time AI communication and the potential for integration with engaging avatar technologies, we can potentially create more efficient, engaging, and data-driven interview processes. The combination of the RealTime Model Gemini Live API and Livekit provides a powerful foundation for building innovative solutions that can transform how organizations connect with and evaluate talent.

Popular posts from this blog

Combine or Merge XML documents in Single XML using Boomi & Groovy

API Design First approach: Implementing quick mock API's using swagger hub and postman

Journey towards launching: Follow My Church Mobile App - (iOS & Android)