implement url cleaning based on ClearURLs rules and logic

This commit is contained in:
m5r
2025-09-20 11:48:38 +02:00
parent ca369dc516
commit 44ebc2b487
15 changed files with 8063 additions and 6424 deletions

115
CLAUDE.md Normal file
View File

@@ -0,0 +1,115 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a Cloudflare Workers project called "url-cleaner" built with TypeScript. It's a serverless URL cleaning service that removes tracking parameters from URLs, similar to the ClearURLs browser extension. The worker runs on Cloudflare's edge network for fast global performance.
## Architecture
- **Runtime**: Cloudflare Workers (serverless edge computing)
- **Language**: TypeScript with ES2021 target
- **Entry Point**: `src/index.ts` - main worker handler accepting `?url=` parameter
- **URL Cleaner**: `src/cleaner.ts` - core cleaning engine with redirect following and fragment cleaning
- **Rules Cache**: `src/rules-cache.ts` - Durable Object for caching ClearURLs rules with SHA256 validation
- **Types**: `src/types.ts` - TypeScript interfaces for ClearURLs rule structure
- **Configuration**: `wrangler.jsonc` - Cloudflare Workers deployment configuration with Durable Objects
- **Testing**: Vitest with Cloudflare Workers testing pool
## API Usage
**Endpoint**: `GET /?url=<encoded-url>`
**Response**: Plain text containing the cleaned URL
**Examples**:
- `/?url=https://example.com?utm_source=test``https://example.com`
- `/?url=https://youtube.com/watch?v=abc&feature=share``https://youtube.com/watch?v=abc`
## Common Commands
### Development
- `npm run dev` or `npm run start` - Start local development server with Wrangler
- `npm run deploy` - Deploy worker to Cloudflare
- `npm test` - Run tests with Vitest
- `npm run format` - Format code with Prettier
- `npm run format:check` - Check code formatting
- `npm run cf-typegen` - Generate TypeScript types for Cloudflare bindings
### Development Server
The development server runs on `http://localhost:8787/` by default.
## Project Structure
```
src/
├── index.ts # Main worker entry point with fetch handler
├── cleaner.ts # Core URL cleaning engine with redirect following and fragment cleaning
├── rules-cache.ts # Durable Object for caching ClearURLs rules with validation
└── types.ts # TypeScript interfaces for ClearURLs rule structure
test/
├── index.spec.ts # Unit and integration tests with ClearURLs rule mocking
├── tsconfig.json # Test-specific TypeScript config
└── env.d.ts # Test environment types
```
## URL Cleaning Features
### Core Functionality
- **Official ClearURLs Rules**: Uses the same rule database as ClearURLs browser extension (250+ providers)
- **Redirect Following**: Follows up to 5 redirects to unwrap shortened URLs and tracking redirects
- **Provider-Specific Rules**: Domain-specific rules prevent false positives and preserve functionality
- **Query Parameter Cleaning**: Removes tracking from URL query parameters (`?utm_source=test`)
- **Fragment Cleaning**: Removes tracking from URL fragments/hash parameters (`#utm_campaign=test`)
- **Raw Rules Support**: Handles complex regex-based cleaning for advanced tracking patterns
- **Loop Prevention**: Tracks visited URLs to prevent infinite redirect loops
- **Response Caching**: 1-hour cache for improved performance
### Rule System
- **Live Updates**: Fetches rules from ClearURLs official API (`https://rules2.clearurls.xyz/`)
- **SHA256 Validation**: Cryptographically verifies rule integrity
- **Durable Object Caching**: 7-day cache with automatic refresh
- **Fallback Support**: Uses cached rules if fresh fetch fails
- **Provider Priority**: Domain-specific providers take precedence over global rules
### Supported Providers (250+ total)
- **Google**: Search, Analytics, Ads tracking removal
- **YouTube**: Preserves video/playlist IDs, removes tracking
- **Amazon**: E-commerce functionality preserved, affiliate tracking removed
- **Facebook/Meta**: Social features preserved, tracking removed
- **Twitter/X**: Post/user functionality preserved, tracking removed
- **TikTok**: Video functionality preserved, tracking removed
- **And 240+ more providers** maintained by the ClearURLs community
### Advanced Features
- **Exception Handling**: Respects provider-specific exception rules
- **Redirection Unwrapping**: Follows ClearURLs redirection patterns
- **Regex Pattern Matching**: Exact ClearURLs compatibility with `^rule$` pattern matching
- **Case-Insensitive Matching**: Handles mixed-case tracking parameters
- **Fragment Parameter Support**: Cleans tracking from URL hash fragments
## Key Files
- `wrangler.jsonc` - Worker configuration including compatibility date and observability
- `vitest.config.mts` - Test configuration using Cloudflare Workers pool
- `worker-configuration.d.ts` - Generated TypeScript definitions for Cloudflare bindings
- `tsconfig.json` - TypeScript configuration with strict mode enabled
## Testing
The project uses Vitest with the Cloudflare Workers testing pool (`@cloudflare/vitest-pool-workers`). Tests support both unit testing (with mocked context) and integration testing (using `SELF.fetch()`).
## TypeScript Configuration
- Target: ES2021
- Module: ES2022
- Strict mode enabled
- JSX support configured for React
- Excludes test files from main compilation