Browser Automation Agent

An AI-powered browser automation tool built with Next.js and Gemini 2.0 Vision AI. Transform natural language into browser automation with visual understanding.

Features

Convert natural language to browser automation steps
Advanced screenshot analysis and visual understanding
Intelligent element detection with fallback strategies
Automatic screenshot capture with metadata
Real-time progress tracking
Clean, minimalist interface
Secure API key management

Prerequisites

Gemini API Key from Google AI Studio
Node.js 18 or higher

Quick Start

Clone the repository:

git clone https://github.com/razee4315/browser-automation-agent.git
cd browser-automation-agent

Install dependencies:
```
npm install
```
(Optional) Create .env.local with your API key:
```
GEMINI_API_KEY=your_gemini_api_key_here
```
Note: The app will prompt for your API key on first use if not set in environment.
Start the development server:
```
npm run dev
```
Open http://localhost:3000 to view the app.
For production build:
```
npm run build
npm start
```

Configuration

API Key Setup

Get your API key from Google AI Studio
Enter it in the app when prompted

Environment Variables

Variable	Description	Required
`GEMINI_API_KEY`	Your Gemini API key	Optional*

*API key can be entered in the app interface

Architecture

src/
├── app/                    # Next.js App Router
│   ├── api/                # API routes
│   └── page.tsx            # Main page
├── components/             # React components
│   ├── ApiKeySetup.tsx     # API key auth
│   ├── AutomationForm.tsx  # User input
│   ├── AutomationStatus.tsx # Progress
│   └── AutomationResults.tsx # Results
└── lib/                   # Utilities
    ├── browser.ts         # Playwright
    ├── gemini.ts          # Gemini AI
    └── debug-helpers.ts   # Debugging

Security & Privacy

API keys stored in browser localStorage
No data collection
HTTPS required
Client-side processing

Browser Support

Chrome/Chromium
Firefox
Safari
Edge

Contributing

Contributions are welcome! Please read our Contributing Guidelines.

Fork the repo
Create a feature branch
Commit your changes
Push and open a Pull Request

Contributors

Development Team

Saqlain Abbas
Developer
GitHub | Email

Aleena Tahir
Developer
GitHub | Email

License

This project is licensed under the MIT License - see the LICENSE file for details.

For support, email us at saqlainrazee@gmail.com

⭐ Star this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
public		public
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
explain.md		explain.md
h -u origin master		h -u origin master
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
setup.js		setup.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Automation Agent

Features

Prerequisites

Quick Start

Configuration

API Key Setup

Environment Variables

Architecture

Security & Privacy

Browser Support

Contributing

Contributors

Development Team

License

About

Uh oh!

Uh oh!

Languages

License

Razee4315/Browser-Agent

Folders and files

Latest commit

History

Repository files navigation

Browser Automation Agent

Features

Prerequisites

Quick Start

Configuration

API Key Setup

Environment Variables

Architecture

Security & Privacy

Browser Support

Contributing

Contributors

Development Team

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages