Part-of-Speech Proportions#

The pos_proportions component adds one attribute to a Doc or Span:

  • Doc._.pos_proportions
    • Dict of {pos_prop_POSTAG: proportion of all tokens tagged with POSTAG}. By default creates a key for each possible POS tag. This behaviour can be turned off

    by setting add_all_tags=False in the component’s initialization.

  • Span._.pos_proportions

    • Dict of {pos_prop_POSTAG: proportion of all tokens tagged with POSTAG}.

Usage#

import spacy
import textdescriptives as td
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/pos_proportions")
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")

# all attributes are stored as a dict in the ._.pos_proportions attribute
doc._.pos_proportions

# extract to dataframe
td.extract_df(doc)

text

pos_prop_ADJ

pos_prop_ADP

pos_prop_ADV

pos_prop_AUX

pos_prop_CCONJ

pos_prop_DET

pos_prop_INTJ

pos_prop_NOUN

pos_prop_NUM

pos_prop_PART

pos_prop_PRON

pos_prop_PROPN

pos_prop_PUNCT

pos_prop_SCONJ

pos_prop_SYM

pos_prop_VERB

pos_prop_X

0

The world is changed(…)

0.0243902

0.097561

0.0487805

0.0731707

0

0.097561

0

0.121951

0

0

0.195122

0

0.146341

0.0243902

0

0.170732

0


Component#

textdescriptives.components.pos_proportions.create_pos_proportions_component(nlp: Language, name: str, use_pos: bool, add_all_tags: bool) Callable[[Doc], Doc][source]#

Allows PosPropotions to be added to a spaCy pipe using nlp.add_pipe(“textdescriptives/pos_proportions”)

Adding this component to a pipeline sets the following attributes:
  • doc._.pos_proportions

  • span._.pos_proportions

Parameters:
  • nlp (Language) – spaCy language object, does not need to be specified in the nlp.add_pipe call.

  • name (str) – name of the component. Can be optionally specified in the nlp.add_pipe call, using the name argument.

  • use_pos – If True, uses the simple token.pos attribute. If False, uses the detailed token.tag attribute.

Returns:

The POSProportions component to be added to the pipe.

Return type:

Callable[[Doc], Doc]

Example

>>> import spacy
>>> import textdescriptives as td
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.add_pipe("textdescriptives/pos_proportions")
>>> # apply the component to a document
>>> doc = nlp("This is a test sentence.")
>>> doc._.pos_proportions