Find it on GitHub: /edgee-cloud/spam-classifier-component
The Spam Classifier component is a high-performance machine learning edge function that provides real-time spam detection using a Naive Bayesian classifier.
This Wasm-based component runs at the edge, offering fast and accurate content classification without requiring external services or additional backend infrastructure.
What is the Spam Classifier Component?
The Spam Classifier component is a security-focused edge function that:
Uses Naive Bayes algorithms with Finite State Transducers (FST) for optimal performance
Provides real-time text classification with confidence scores
Supports multi-language content analysis
Runs entirely at the edge as a Wasm component
Returns detailed classification results with spam probabilities
Requires no external API calls or backend dependencies
Getting Started
Access Component Library
Open the Edgee console and navigate to your project’s Components section.
Add Spam Classifier Component
Select “Add a component” and choose edgee/spam-classifier
from the list of available components.
Configure Component Settings
Set up the component configuration:
Endpoint path : Configure the URL path (e.g., /classify
or /spam-check
)
Spam classification threshold (optional): Set the probability threshold (default: 0.80 works well for most cases)
Laplace smoothing factor (optional): Configure Laplace smoothing (default: 1.0 provides good balance)
Deploy Component
Click Save to deploy the component to your edge infrastructure. The component will be available at your configured endpoint within minutes.
Configuration
When adding the Spam Classifier component to your project through the Edgee console, you can customize its behavior with these settings:
path
string
default: "/classify"
required
The URL path where the spam classifier will be accessible. This endpoint will receive POST requests with text content for classification.
Spam classification threshold (optional). Probability threshold above which content is classified as spam. The default value of 0.80 works well for most use cases. Range: 0.0-1.0. Higher values = stricter detection, reducing false positives but may miss subtle spam. Lower values catch more spam but increase false positives.
Laplace smoothing factor (optional). Smoothing parameter for the Naive Bayes classifier that handles unseen tokens. The default value of 1.0 provides good balance for most content types. Range: 0.0+. Higher values provide more conservative classifications.
Configuration changes take effect immediately without requiring component redeployment. You can adjust these values based on your content patterns and false positive tolerance.
API Reference
Request Parameters
The text content to classify for spam detection.
Example Request
curl -X POST https://yourdomain.com/classify \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, how are you today? I hope you are having a great day!"
}'
Response Fields
Probability that the content is spam, ranging from 0.0 to 1.0. Values closer to 1.0 indicate higher likelihood of spam.
Probability that the content is legitimate (ham), ranging from 0.0 to 1.0. Always equals 1.0 - spam_probability.
Boolean flag indicating whether the content exceeds the configured spam threshold. True if spam_probability >= spam_threshold.
Classification confidence level, calculated as the absolute difference between spam and ham probabilities.
The original input text that was classified, echoed back in the response.
Example Response
{
"text" : "Hello, how are you today? I hope you are having a great day!" ,
"spam_probability" : 0.23 ,
"ham_probability" : 0.77 ,
"is_spam" : false ,
"confidence" : 0.54
}
Usage Examples
JavaScript/Node.js
Python
cURL
const classifyText = async ( text ) => {
try {
const response = await fetch ( '/classify' , {
method: 'POST' ,
headers: {
'Content-Type' : 'application/json'
},
body: JSON . stringify ({ input: text })
});
if (! response . ok ) {
throw new Error ( `HTTP error! status: ${ response . status } ` );
}
const result = await response . json ();
if ( result . is_spam ) {
console . log ( `Spam detected with ${ ( result . spam_probability * 100 ). toFixed ( 1 ) } % confidence` );
} else {
console . log ( `Content is legitimate with ${ ( result . confidence * 100 ). toFixed ( 1 ) } % confidence` );
}
return result ;
} catch ( error ) {
console . error ( 'Classification error:' , error );
throw error ;
}
};
// Example usage with different content types
await classifyText ( "Hello, how are you today?" );
await classifyText ( "Buy now! Limited time offer! Click here for amazing deals!" );
// Simple form validation with spam detection
document . getElementById ( 'messageForm' ). addEventListener ( 'submit' , async ( e ) => {
e . preventDefault ();
const messageInput = document . getElementById ( 'messageInput' );
const message = messageInput . value ;
try {
// Check for spam
const response = await fetch ( '/classify' , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ({ input: message })
});
const result = await response . json ();
if ( result . is_spam ) {
alert ( `Message appears to be spam ( ${ ( result . spam_probability * 100 ). toFixed ( 1 ) } % confidence)` );
return ;
}
// Submit the form if content is legitimate
console . log ( 'Message validated, submitting form' );
// Your form submission logic here
} catch ( error ) {
console . error ( 'Validation error:' , error );
// Proceed with submission if spam check fails
}
});
The Spam Classifier component delivers exceptional performance at the edge:
Benchmark Results (x86, native)
Short text (~5 words): ~28 µs processing time (72K tokens/sec)
Medium text (~15 words): ~66 µs processing time (227K tokens/sec)
Long text (~62 words): ~128 µs processing time (484K tokens/sec)
O(log n) token lookup using Finite State Transducers
64-bit packed counters for memory efficiency
Log-space calculations to prevent numerical overflow
Static model embedding eliminates file I/O overhead
Sub-millisecond response times for most content
Technical Details
Machine Learning Architecture
The component uses a sophisticated Naive Bayes implementation optimized for edge computing:
Text Processing Pipeline : Advanced Unicode tokenization with multi-language support
Feature Engineering : AlphaNumeric token filtering and text normalization
Classification Algorithm : Naive Bayes with configurable Laplace smoothing
Performance Optimization : FST-based token lookup and log-space probability calculations
Text Processing Features
Tokenization Classification Optimization Advanced Unicode-aware tokenization that handles:
Multi-language text processing
Special character normalization
Stemming and lowercase conversion
Stop word filtering
Use Cases
The Spam Classifier component is ideal for:
Content Moderation
Comment Systems : Filter spam comments on blogs and forums
User-Generated Content : Moderate posts, reviews, and submissions
Social Media : Detect spam in messages and posts
Security Applications
Form Protection : Prevent spam submissions in contact forms
API Security : Filter malicious content in API requests
Email Systems : Pre-filter messages before processing
Quality Control
Content Quality : Ensure high-quality user contributions
Automated Triage : Route suspicious content for human review
Compliance : Meet anti-spam regulatory requirements
Limitations and Considerations
This edge-optimized classifier prioritizes speed and simplicity over maximum accuracy. For applications requiring the highest precision spam detection, consider dedicated spam filtering services or more sophisticated machine learning solutions.
Compared to Enterprise Solutions
Simpler feature set : Uses basic Naive Bayes with token-based analysis
No behavioral analysis : Lacks sender reputation, link analysis, or pattern recognition
Limited training data : Smaller model size optimized for edge deployment
No real-time updates : Model updates require component redeployment
Best Fit Use Cases
This classifier works well for:
Basic content filtering where speed is prioritized over perfect accuracy
First-line defense in multi-layer spam protection strategies
Edge computing scenarios where low latency is critical
Privacy-focused applications that avoid external API calls
Consider combining this component with other security measures like rate limiting, CAPTCHA, or human moderation for comprehensive protection.
Error Handling
The component implements robust error handling with proper HTTP status codes:
Status Codes
200 OK
: Successful classification with valid JSON input
400 Bad Request
: Invalid request body or malformed JSON
500 Internal Server Error
: Component processing error
When errors occur, the component returns a structured JSON error response:
{
"error" : "Error message describing what went wrong"
}
Best Practices
Start with a spam threshold of 0.80 and adjust based on your content patterns. Higher thresholds reduce false positives but may miss some spam.
The component processes text content only. Binary data, HTML tags, and special formatting are normalized during tokenization.
For optimal performance, consider batching multiple short texts into single requests when processing large volumes of content.
The embedded classification model is trained on diverse, multilingual datasets including:
Email spam detection datasets
Comment spam collections
Social media spam samples
Multilingual content examples
The model supports incremental updates and can be retrained with domain-specific data for improved accuracy in specialized use cases.