Skip to content

yaxit24/pm

Repository files navigation

ParakeetAI — Real-Time Interview Assistant

A desktop Electron app that listens to your live interview via microphone, transcribes speech in real-time using Deepgram, and generates AI-powered answers instantly from Groq, OpenAI, Anthropic, or Google Gemini. Also includes screenshot analysis for coding/whiteboard screens.


What It Does

Feature Description
Live Transcription Captures microphone audio, streams it to Deepgram via WebSocket, and displays real-time & final transcripts
AI Answer Generation Automatically sends the last few transcript sentences to an LLM and displays generated answers in real-time
Screenshot Analysis One-click screenshot capture + AI vision analysis for coding questions, shared documents, or video call screens
Multi-Provider LLM Supports Groq, OpenAI, Anthropic (Claude), and Google Gemini — switch in Settings
Custom Context Upload your resume, job description, and extra instructions so answers are tailored to you

Screenshots / Layout

┌─────────────────────────────────────────────┐
│  [Start Transcription] [AI] [Screen] [End]  │
├─────────────────────────────────────────────┤
│  Live Transcript (top, compact, rolling)    │
│  "What is the time complexity of..."        │
│  "Can you explain how React hooks work?"  │
├─────────────────────────────────────────────┤
│  AI Answers (bottom, main area)             │
│  Q: "What is the time complexity..."       │
│  A: "O(n log n) because..."                 │
├─────────────────────────────────────────────┤
│  Debug Log (amber, shows pipeline status)   │
└─────────────────────────────────────────────┘

Tech Stack

Layer Technology
Framework Electron 35 + React 18 + TypeScript
Bundler Webpack (via ERB — Electron React Boilerplate)
Styling Tailwind CSS
Transcription Deepgram Streaming API (WebSocket, nova-2 model)
AI LLM Groq / OpenAI / Anthropic / Google Gemini (streaming)
Screenshot screenshot-desktop (native module)
WebSocket ws library (Node.js, main process)

Prerequisites

  • macOS (11+ / Big Sur or newer)
  • Node.js 20+ and npm
  • API keys for at least one service:

Setup

1. Clone & Install

git clone https://github.com/yaxit24/pm.git
cd pm
npm install

This installs all dependencies and rebuilds native modules for Electron.

2. Configure API Keys

Launch the app in dev mode:

npm start

Click Settings (gear icon) and enter:

  • Deepgram API Key — required for transcription
  • LLM API Key — required for AI answers (pick one provider)
    • Groq (recommended, fastest): llama-3.3-70b-versatile
    • OpenAI: gpt-4o-mini
    • Anthropic: claude-3-5-sonnet-20241022
    • Gemini: gemini-2.0-flash

3. Add Context (Optional but Recommended)

In Settings, fill:

  • Job Description — so AI knows the role
  • Resume — so AI tailors answers to your background
  • Extra Instructions — any custom rules (e.g., "Keep answers under 50 words")

4. Build for Distribution

npm run package

Output:

  • release/build/mac-arm64/pmodule.app — Apple Silicon (M1/M2/M3)
  • release/build/mac/pmodule.app — Intel Mac
  • release/build/pmodule-3.2.28-arm64.dmg — DMG installer

How to Use During an Interview

  1. Open the app before your interview starts
  2. Click Start Transcription — the mic icon turns green when connected
  3. Speak normally — transcripts appear live in the top panel
  4. AI answers appear automatically in the bottom panel as new sentences finalize
  5. Click Screen to capture & analyze any coding screen or document
  6. Click End when finished

Architecture Overview

┌─────────────┐     ┌──────────────┐     ┌─────────────────────┐
│  Renderer   │────▶│  Main Process │────▶│  Deepgram WebSocket  │
│  (React)    │     │  (Node.js)    │     │  wss://api.deepgram  │
│             │     │               │     │  /v1/listen          │
│ - Mic capture│     │ - ws library  │     │                      │
│ - MediaRecorder│   │ - Auth header │     │  Audio → Transcripts │
│ - IPC send   │◀────│ - IPC relay   │◀────│                      │
│              │     │               │     │                      │
│ - Transcript │     │               │     │  LLM Providers       │
│   display    │     │               │     │  (fetch, streaming) │
│ - AI answers │◀────│               │◀────│  Groq/OpenAI/etc    │
└─────────────┘     └──────────────┘     └─────────────────────┘

Why main process for WebSocket? Browser WebSockets cannot set custom headers like Authorization: Token <key>. We use the Node.js ws library in the main process, then relay transcripts back to the renderer via IPC.


Troubleshooting

Symptom Likely Cause Fix
"Transcript 0 lines" but connection shows true Mic returning silence (macOS hardened runtime blocks hardware) Rebuild with mic entitlements (already included). Reset permission: tccutil reset Microphone com.electron.pmodule
"WebSocket was closed before connection" Race condition on reconnect Fixed — uses terminate() + generation counter
No AI answers appearing autoGen disabled or no LLM key Enable "Auto-generate answers" in Settings and add LLM key
Gatekeeper warning on other Macs App is signed but not notarized User runs xattr -cr /Applications/pmodule.app once
Audio chunks are ~70 bytes Mic stream is silent (permission or hardware) Check System Settings → Sound → Input level is not at zero
Slow AI answers Wrong provider or large context Use Groq, keep answers brief, context limited to last 5 sentences

Distribution Notes

  • The app is signed with a Mac Developer Distribution certificate
  • It is not notarized — users on other Macs will see a Gatekeeper warning
  • To fully distribute without warnings, enable notarization in package.json and set Apple ID credentials before building
  • Supports both arm64 (Apple Silicon) and x64 (Intel) architectures

License

MIT — based on Electron React Boilerplate.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors