peRspective: Interface to the Perspective API — perspective-package • peRspective

Provides access to the Perspective API (http://www.perspectiveapi.com/). Perspective is an API that uses machine learning models to score the perceived impact a comment might have on a conversation. peRspective provides access to the API using the R programming language. For an excellent documentation of the Perspective API see here.

Get API Key

Follow these steps as outlined by the Perspective API to get an API key.

Suggested Usage of API Key

peRspective functions will read the API key from environment variable perspective_api_key. You can specify it like this at the start of your script:

Sys.setenv(perspective_api_key = "**********")

To start R session with the initialized environment variable create an .Renviron file in your R home with a line like this:

perspective_api_key = "**********"

To check where your R home is, try normalizePath("~").

Quota and character length Limits

You can check your quota limits by going to your google cloud project's Perspective API page, and check your projects quota usage at the cloud console quota usage page.

The maximum text size per request is 3000 bytes.

Models in Productions

The following production-ready models are recommended for use. They have been tested across multiple domains and trained on hundreds of thousands of comments tagged by thousands of human moderators. These are available in English (en), Spanish, (es), French (fr), German (de), Portuguese (pt), Italian (it), Russian (ru).

TOXICITY: rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion. This model is a Convolutional Neural Network (CNN) trained with word-vector inputs.
SEVERE_TOXICITY: This model uses the same deep-CNN algorithm as the TOXICITY model, but is trained to recognize examples that were considered to be 'very toxic' by crowdworkers. This makes it much less sensitive to comments that include positive uses of curse-words for example. A labelled dataset and details of the methodology can be found in the same toxicity dataset that is available for the toxicity model.

Experimental models

The following experimental models give more fine-grained classifications than overall toxicity. They were trained on a relatively smaller amount of data compared to the primary toxicity models above and have not been tested as thoroughly.

IDENTITY_ATTACK: negative or hateful comments targeting someone because of their identity.
INSULT: insulting, inflammatory, or negative comment towards a person or a group of people.
PROFANITY: swear words, curse words, or other obscene or profane language.
THREAT: describes an intention to inflict pain, injury, or violence against an individual or group.
SEXUALLY_EXPLICIT: contains references to sexual acts, body parts, or other lewd content.
FLIRTATION: pickup lines, complimenting appearance, subtle sexual innuendos, etc.

For more details on how these were trained, see the Toxicity and sub-attribute annotation guidelines.

New York Times moderation models

The following experimental models were trained on New York Times data tagged by their moderation team.

ATTACK_ON_AUTHOR: Attack on the author of an article or post.
ATTACK_ON_COMMENTER: Attack on fellow commenter.
INCOHERENT: Difficult to understand, nonsensical.
INFLAMMATORY: Intending to provoke or inflame.
LIKELY_TO_REJECT: Overall measure of the likelihood for the comment to be rejected according to the NYT's moderation.
OBSCENE: Obscene or vulgar language such as cursing.
SPAM: Irrelevant and unsolicited commercial content.
UNSUBSTANTIAL: Trivial or short comments.

Don't forget to regain your spirits

Analyzing toxic comments can be disheartening sometimes. Feel free to look at this picture of cute kittens whenever you need to:

Kittens