@jcottam/html-metadata
is a lightweight, TypeScript-first JavaScript library for extracting HTML meta tags, Open Graph tags, and other metadata from HTML content or URLs. Perfect for social media sharing, SEO analysis, and web scraping applications.
Compatibility: Works seamlessly with Node.js (CommonJS) and modern browsers (ES6+).
- 🚀 Fast & Lightweight - Built on Cheerio for optimal performance
- 📱 Open Graph Support - Extract all Open Graph meta tags for social media
- 🎯 TypeScript Ready - Full type definitions and IntelliSense support
- 🌐 URL & HTML Support - Extract from URLs or HTML strings directly
- 🔧 Configurable - Customizable extraction with filtering and timeout options
- 🛡️ Error Resilient - Graceful handling of malformed HTML and network errors
- 📦 Zero Dependencies - Only depends on Cheerio for HTML parsing
npm install @jcottam/html-metadata
import { extractFromUrl, extractFromHTML } from "@jcottam/html-metadata"
const { extractFromUrl, extractFromHTML } = require("@jcottam/html-metadata")
import { extractFromUrl } from "@jcottam/html-metadata"
// Basic usage
const metadata = await extractFromUrl("https://www.retool.com")
console.log(metadata)
// Output: { lang: "en", title: "Retool", og:title: "...", og:description: "...", ... }
// With options
const options = {
timeout: 5000, // 5 second timeout
metaTags: ["og:title", "og:description", "og:image"], // Only extract specific tags
}
const filteredMetadata = await extractFromUrl("https://example.com", options)
import { extractFromHTML } from "@jcottam/html-metadata"
const html = `
<html lang="en">
<head>
<title>My Website</title>
<meta property="og:title" content="My Amazing Website" />
<meta property="og:description" content="This is a brief description" />
<meta property="og:image" content="https://example.com/image.jpg" />
<link rel="icon" href="/favicon.ico" />
</head>
</html>
`
const metadata = extractFromHTML(html)
console.log(metadata)
// Output: {
// lang: "en",
// title: "My Website",
// "og:title": "My Amazing Website",
// "og:description": "This is a brief description",
// "og:image": "https://example.com/image.jpg",
// favicon: "/favicon.ico"
// }
const html = '<html><head><link rel="icon" href="/favicon.ico" /></head></html>'
const options = { baseUrl: "https://example.com" }
const metadata = extractFromHTML(html, options)
console.log(metadata.favicon) // "https://example.com/favicon.ico"
Extracts metadata from an HTML string.
Parameters:
html
(string): The HTML content to parseoptions
(Options, optional): Configuration options
Returns: ExtractedData
- Object containing extracted metadata
Extracts metadata from a URL by fetching the HTML content.
Parameters:
url
(string): The URL to fetch and extract metadata fromoptions
(Options, optional): Configuration options
Returns: Promise<ExtractedData | null>
- Promise that resolves to extracted metadata or null if extraction fails
type Options = {
/** Base URL for resolving relative links (e.g., favicon, apple-touch-icon) */
baseUrl?: string
/** Fetch timeout in milliseconds for URL extraction */
timeout?: number
/** Specific meta tags to extract. If not provided, all meta tags will be extracted */
metaTags?: string[]
}
type ExtractedData = {
/** Language attribute from the HTML tag */
lang?: string
/** Page title from the title tag */
title?: string
/** Favicon URL */
favicon?: string
/** Apple touch icon URL */
"apple-touch-icon"?: string
/** Open Graph and other meta tag properties */
[key: string]: string | undefined
}
{
"lang": "en",
"title": "Retool | The fastest way to build internal software.",
"og:type": "website",
"og:url": "https://retool.com/",
"og:title": "Retool | The fastest way to build internal software.",
"og:description": "Retool is the fastest way to build internal software. Use Retool's building blocks to build apps and workflow automations that connect to your databases and APIs, instantly.",
"og:image": "https://d3399nw8s4ngfo.cloudfront.net/og-image-default.webp",
"favicon": "/favicon.png",
"apple-touch-icon": "/apple-touch-icon.png"
}
When using extractFromUrl
in browsers, you may encounter CORS restrictions. To bypass CORS:
- Server-side usage: Run
extractFromUrl
on a server - Proxy services: Use a CORS proxy like AllOrigins
- Browser extensions: Use CORS-disabling browser extensions for development
The library handles errors gracefully:
// Network errors return null
const result = await extractFromUrl("https://invalid-url.com")
if (result === null) {
console.log("Failed to fetch or parse the URL")
}
// Malformed HTML is handled gracefully
const metadata = extractFromHTML(
"<html><head><meta property='og:title' content='Test'"
)
console.log(metadata["og:title"]) // "Test"
The library extracts the following types of metadata:
- HTML attributes:
lang
from<html>
tag - Title: Content from
<title>
tag - Favicon:
href
from<link rel="icon">
tags - Apple Touch Icon:
href
from<link rel="apple-touch-icon">
tags - Meta tags: All
<meta>
tags withname
orproperty
attributes - Open Graph: All
og:*
properties - Twitter Cards: All
twitter:*
properties - Custom meta tags: Any custom meta tags you define
- Node.js 18+
- npm
git clone https://github.com/jcottam/html-metadata.git
cd html-metadata
npm install
npm run build # Build the library
npm test # Run tests
npm run release # Release new version (manual)
This project uses automated dependency management and releases:
- Renovate Bot: Automatically updates dependencies and creates pull requests
- GitHub Actions: Automatically releases new versions when changes are pushed to main
- Manual Release: Use
npm run release
for immediate releases or specific version bumps
The project uses Vitest for testing. Run tests with:
npm test
- Cheerio: Fast, flexible HTML parsing
- Vitest: Next-generation testing framework
- Rollup: Module bundler for multiple formats
We welcome contributions! Please follow these guidelines:
- Fork the repository and create a feature branch
- Make changes and ensure tests pass (
npm test
) - Add tests for new functionality
- Update documentation if needed
- Submit a pull request with a clear description
- Follow TypeScript best practices
- Add JSDoc comments for new functions
- Ensure all tests pass
- Update README for new features
- Use conventional commit messages
MIT License - see LICENSE.md for details.