Find it on GitHub: /edgee-cloud/spam-classifier-component The Spam Classifier component is a high-performance machine learning edge function that provides real-time spam detection using a Naive Bayesian classifier. This Wasm-based component runs at the edge, offering fast and accurate content classification without requiring external services or additional backend infrastructure.

What is the Spam Classifier Component?

The Spam Classifier component is a security-focused edge function that:
  • Uses Naive Bayes algorithms with Finite State Transducers (FST) for optimal performance
  • Provides real-time text classification with confidence scores
  • Supports multi-language content analysis
  • Runs entirely at the edge as a Wasm component
  • Returns detailed classification results with spam probabilities
  • Requires no external API calls or backend dependencies

Getting Started

1

Access Component Library

Open the Edgee console and navigate to your project’s Components section.
2

Add Spam Classifier Component

Select “Add a component” and choose edgee/spam-classifier from the list of available components.
3

Configure Component Settings

Set up the component configuration:
  • Endpoint path: Configure the URL path (e.g., /classify or /spam-check)
  • Spam classification threshold (optional): Set the probability threshold (default: 0.80 works well for most cases)
  • Laplace smoothing factor (optional): Configure Laplace smoothing (default: 1.0 provides good balance)
4

Deploy Component

Click Save to deploy the component to your edge infrastructure.
The component will be available at your configured endpoint within minutes.

Configuration

When adding the Spam Classifier component to your project through the Edgee console, you can customize its behavior with these settings:
path
string
default:"/classify"
required
The URL path where the spam classifier will be accessible. This endpoint will receive POST requests with text content for classification.
spam_threshold
number
default:"0.80"
Spam classification threshold (optional). Probability threshold above which content is classified as spam. The default value of 0.80 works well for most use cases. Range: 0.0-1.0. Higher values = stricter detection, reducing false positives but may miss subtle spam. Lower values catch more spam but increase false positives.
laplace_smoothing_factor
number
default:"1.0"
Laplace smoothing factor (optional). Smoothing parameter for the Naive Bayes classifier that handles unseen tokens. The default value of 1.0 provides good balance for most content types. Range: 0.0+. Higher values provide more conservative classifications.
Configuration changes take effect immediately without requiring component redeployment. You can adjust these values based on your content patterns and false positive tolerance.

API Reference

Request Parameters

input
string
required
The text content to classify for spam detection.

Example Request

curl -X POST https://yourdomain.com/classify \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, how are you today? I hope you are having a great day!"
  }'

Response Fields

spam_probability
float
Probability that the content is spam, ranging from 0.0 to 1.0. Values closer to 1.0 indicate higher likelihood of spam.
ham_probability
float
Probability that the content is legitimate (ham), ranging from 0.0 to 1.0. Always equals 1.0 - spam_probability.
is_spam
boolean
Boolean flag indicating whether the content exceeds the configured spam threshold. True if spam_probability >= spam_threshold.
confidence
float
Classification confidence level, calculated as the absolute difference between spam and ham probabilities.
text
string
The original input text that was classified, echoed back in the response.

Example Response

{
  "text": "Hello, how are you today? I hope you are having a great day!",
  "spam_probability": 0.23,
  "ham_probability": 0.77,
  "is_spam": false,
  "confidence": 0.54
}

Usage Examples

const classifyText = async (text) => {
  try {
    const response = await fetch('/classify', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ input: text })
    });
    
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
    
    const result = await response.json();
    
    if (result.is_spam) {
      console.log(`Spam detected with ${(result.spam_probability * 100).toFixed(1)}% confidence`);
    } else {
      console.log(`Content is legitimate with ${(result.confidence * 100).toFixed(1)}% confidence`);
    }
    
    return result;
  } catch (error) {
    console.error('Classification error:', error);
    throw error;
  }
};

// Example usage with different content types
await classifyText("Hello, how are you today?");
await classifyText("Buy now! Limited time offer! Click here for amazing deals!");

Form Validation Integration

// Simple form validation with spam detection
document.getElementById('messageForm').addEventListener('submit', async (e) => {
  e.preventDefault();
  
  const messageInput = document.getElementById('messageInput');
  const message = messageInput.value;
  
  try {
    // Check for spam
    const response = await fetch('/classify', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ input: message })
    });
    
    const result = await response.json();
    
    if (result.is_spam) {
      alert(`Message appears to be spam (${(result.spam_probability * 100).toFixed(1)}% confidence)`);
      return;
    }
    
    // Submit the form if content is legitimate
    console.log('Message validated, submitting form');
    // Your form submission logic here
    
  } catch (error) {
    console.error('Validation error:', error);
    // Proceed with submission if spam check fails
  }
});

Performance Characteristics

The Spam Classifier component delivers exceptional performance at the edge:

Benchmark Results (x86, native)

  • Short text (~5 words): ~28 µs processing time (72K tokens/sec)
  • Medium text (~15 words): ~66 µs processing time (227K tokens/sec)
  • Long text (~62 words): ~128 µs processing time (484K tokens/sec)

Key Performance Features

  • O(log n) token lookup using Finite State Transducers
  • 64-bit packed counters for memory efficiency
  • Log-space calculations to prevent numerical overflow
  • Static model embedding eliminates file I/O overhead
  • Sub-millisecond response times for most content

Technical Details

Machine Learning Architecture

The component uses a sophisticated Naive Bayes implementation optimized for edge computing:
  • Text Processing Pipeline: Advanced Unicode tokenization with multi-language support
  • Feature Engineering: AlphaNumeric token filtering and text normalization
  • Classification Algorithm: Naive Bayes with configurable Laplace smoothing
  • Performance Optimization: FST-based token lookup and log-space probability calculations

Text Processing Features

Advanced Unicode-aware tokenization that handles:
  • Multi-language text processing
  • Special character normalization
  • Stemming and lowercase conversion
  • Stop word filtering

Use Cases

The Spam Classifier component is ideal for:

Content Moderation

  • Comment Systems: Filter spam comments on blogs and forums
  • User-Generated Content: Moderate posts, reviews, and submissions
  • Social Media: Detect spam in messages and posts

Security Applications

  • Form Protection: Prevent spam submissions in contact forms
  • API Security: Filter malicious content in API requests
  • Email Systems: Pre-filter messages before processing

Quality Control

  • Content Quality: Ensure high-quality user contributions
  • Automated Triage: Route suspicious content for human review
  • Compliance: Meet anti-spam regulatory requirements

Limitations and Considerations

This edge-optimized classifier prioritizes speed and simplicity over maximum accuracy. For applications requiring the highest precision spam detection, consider dedicated spam filtering services or more sophisticated machine learning solutions.

Compared to Enterprise Solutions

  • Simpler feature set: Uses basic Naive Bayes with token-based analysis
  • No behavioral analysis: Lacks sender reputation, link analysis, or pattern recognition
  • Limited training data: Smaller model size optimized for edge deployment
  • No real-time updates: Model updates require component redeployment

Best Fit Use Cases

This classifier works well for:
  • Basic content filtering where speed is prioritized over perfect accuracy
  • First-line defense in multi-layer spam protection strategies
  • Edge computing scenarios where low latency is critical
  • Privacy-focused applications that avoid external API calls
Consider combining this component with other security measures like rate limiting, CAPTCHA, or human moderation for comprehensive protection.

Error Handling

The component implements robust error handling with proper HTTP status codes:

Status Codes

  • 200 OK: Successful classification with valid JSON input
  • 400 Bad Request: Invalid request body or malformed JSON
  • 500 Internal Server Error: Component processing error

Error Response Format

When errors occur, the component returns a structured JSON error response:
{
  "error": "Error message describing what went wrong"
}

Best Practices

Start with a spam threshold of 0.80 and adjust based on your content patterns. Higher thresholds reduce false positives but may miss some spam.
The component processes text content only. Binary data, HTML tags, and special formatting are normalized during tokenization.
For optimal performance, consider batching multiple short texts into single requests when processing large volumes of content.

Model Information

The embedded classification model is trained on diverse, multilingual datasets including:
  • Email spam detection datasets
  • Comment spam collections
  • Social media spam samples
  • Multilingual content examples
The model supports incremental updates and can be retrained with domain-specific data for improved accuracy in specialized use cases.