9.2. mollom.checkContent
mollom.checkContent |
|||
| Required | Name | Type | Description |
| required | public_key |
string | Site public key |
| required | time |
string | Site server time in this format: yyyy-MM-dd'T'HH:mm:ss-.SSSZ |
| required | hash |
string | HMAC-SHA1 digest |
| required | nonce |
string | One time nonce |
| optional | session_id |
string | Current session ID |
| optional | post_title |
string | Title of submitted post |
| optional | post_body |
string | Body of submitted post |
| optional | author_name |
string | Submitting user's name or nick |
| optional | author_url |
string | Submitting user's URL |
| optional | author_mail |
string | Submitting user's email address |
| optional | author_openid |
string | Submitting user's openID |
| optional | author_ip |
string | Submitting user's current IP |
| optional | author_id |
string | Submitting user's unique ID (on the site) |
| optional | checks |
string | A comma-separated list of checks. Available checks include 'spam', 'quality', 'profanity', 'sentiment' and 'language'. |
| optional | strictness |
string | Allows to adjust content classifier results; i.e., the probability for a spam result. Possible values: 'strict', 'normal', 'relaxed'. Defaults to 'normal'. |
| optional | classifier |
string | Use a custom classifier chain. The value is a comma-separated list of classifiers. |
| returns | spam |
integer | Returns 1 if ham, 2 if spam, 3 if unsure |
| returns | quality |
double | An assessment of the content's quality, between 0 and 1; 0 being very low, 1 being high quality |
| returns | profanity |
double | An assessment of the content's profanity level, between 0 and 1; 0 being non-profane, 1 being very profane |
| returns | sentiment |
double | An assessment of the content's sentiment, between 0 and 1; 0 being a very negative sentiment, 1 being a very positive sentiment |
| returns | language |
list of structs | A list of structs containing pairs of language and confidence values. |
| returns | session_id |
string | Session ID |
The mollom.checkContent call is probably the most frequently used Mollom call. It can be used check if a comment is spam or not, detect the language of the comment and to get an assessment of its quality. Several checks can be run in a single call, by providing multiple values in the checks parameter. If no value is set for the checks parameter, only the spam check is executed.
Spam check
With the spam-check, the mollom.checkContent call will return 'ham', 'spam' or 'unsure' (encoded as 1, 2 and 3, respectively) together with a session ID. If Mollom returns 'ham' or 'spam', the content can be safely accepted or rejected, as the case may be. But if Mollom returns 'unsure', an additional check is needed to decide if the content can be accepted or not. Mollom provides CAPTCHA challenges for this check, but other mechanisms could be used. Mollom is designed so that only a small fraction of human-submitted content
will be flagged as unsure.
Note that if Mollom returns 'spam', no CAPTCHA should be shown to the user. Mollom will only return 'spam' if it is 100% sure that the content is spam. It is essential that these attempts are blocked without presenting any CAPTCHA. This allows Mollom to block both spambots trying to hack the CAPTCHAs and human users sending spam.
The behaviour of the spam-check can be influenced by supplying a value for the reputation or the classifier parameters. The classifier parameter, however, does not have any effect at the moment.
The reputation parameter accepts the following values:
captcha-blocking-normalcaptcha-blocking-relaxcaptcha-blocking-strictnocaptcha-blocking-normalnocaptcha-blocking-relaxnocaptcha-blocking-strictcaptcha-nonblocking-normalcaptcha-nonblocking-relaxcaptcha-nonblocking-strictcaptcha-blocking-repeated
Quality and profanity checks
The quality and profanity scores returned by mollom.checkContent are real valued between 0 and 1, where 0 denotes very bad quality or not profane, and 1 very high quality or highly profane. Mollom only returns a score, clients must define for themselves the quality or profanity level cutoff between content acceptance and rejection. The scores could also be used to present the content sorted in a way that makes moderation easier.
Language check
The language-check replaces the older mollom.detectLanguage call, which is now deprecated. Given a very limited amount of text (minimum of 15 characters), Mollom can detect its probable language (out of approximately 75 languages) with a high degree of accuracy. This feature can be used to prevent the use of foreign languages on your site, or to automatically segment the content of users based on their posting language.
Each value in the returned result is a struct (see example) that contains two named values: language and confidence. "language" is a string representing either a two-character ISO-639-1 code (if no ISO-639-1 code is available, a ISO-639-3 three letter language code is returned), while "confidence" is a numeric double representing Mollom's confidence in the accuracy of its assessment. Multiple pairs of language and confidence elements may be returned; if so, the elements are arranged in descending order of confidence.
If the language cannot be determined, "zxx" code is returned as the value of the language element, and is defined as "no linguistic content, not applicable". If the text is determined to be too random to be a known language, "und" code is returned as the value of the language element, and is defined as "undetermined".
Results of the language check resemble the following snippet below.
<value>
<array>
<data>
<value>
<struct>
<member><name>language</name><value><string>nl</string></value></member>
<member><name>confidence</name><value><double>0.558</double></value></member>
</struct>
</value>
</data>
</array>
</value>
Remarks
- Apart from the authentication fields, which are compulsory, all other fields are optional. This means that they can either be left out altogether or be empty strings. However, the more information Mollom receives, the more accurate its classification will be.
- If multiple OpenIDs are given for a user, they can all be passed into the OpenID field by separating them with white spaces (spaces, tabs or new lines).
- If a site has content types that do not map well onto the specified fields (for example, a 'survey' content type), content type fields or data can be concatenated and passed into the post body field.
- A unique user ID (user name or numeric ID) can be passed to Mollom. If no user ID is known (for an anonymous user, for example) no value should be passed.
