Part-of-Speech Proportions#
The pos_proportions component adds one attribute to a Doc or Span:
Doc._.pos_proportions
Dict of
{pos_prop_POSTAG: proportion of all tokens tagged with POSTAG}
. By default creates a key for each possible POS tag. This behaviour can be turned off
by setting
add_all_tags=False
in the component’s initialization.
Span._.pos_proportions
Dict of
{pos_prop_POSTAG: proportion of all tokens tagged with POSTAG}
.
Usage#
import spacy
import textdescriptives as td
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/pos_proportions")
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")
# all attributes are stored as a dict in the ._.pos_proportions attribute
doc._.pos_proportions
# extract to dataframe
td.extract_df(doc)
text |
pos_prop_ADJ |
pos_prop_ADP |
pos_prop_ADV |
pos_prop_AUX |
pos_prop_CCONJ |
pos_prop_DET |
pos_prop_INTJ |
pos_prop_NOUN |
pos_prop_NUM |
pos_prop_PART |
pos_prop_PRON |
pos_prop_PROPN |
pos_prop_PUNCT |
pos_prop_SCONJ |
pos_prop_SYM |
pos_prop_VERB |
pos_prop_X |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
The world is changed(…) |
0.0243902 |
0.097561 |
0.0487805 |
0.0731707 |
0 |
0.097561 |
0 |
0.121951 |
0 |
0 |
0.195122 |
0 |
0.146341 |
0.0243902 |
0 |
0.170732 |
0 |
Component#
- textdescriptives.components.pos_proportions.create_pos_proportions_component(nlp: Language, name: str, use_pos: bool, add_all_tags: bool) Callable[[Doc], Doc] [source]#
Allows PosPropotions to be added to a spaCy pipe using nlp.add_pipe(“textdescriptives/pos_proportions”)
- Adding this component to a pipeline sets the following attributes:
doc._.pos_proportions
span._.pos_proportions
- Parameters:
nlp (Language) – spaCy language object, does not need to be specified in the nlp.add_pipe call.
name (str) – name of the component. Can be optionally specified in the nlp.add_pipe call, using the name argument.
use_pos – If True, uses the simple token.pos attribute. If False, uses the detailed token.tag attribute.
- Returns:
The POSProportions component to be added to the pipe.
- Return type:
Callable[[Doc], Doc]
Example
>>> import spacy >>> import textdescriptives as td >>> nlp = spacy.load("en_core_web_sm") >>> nlp.add_pipe("textdescriptives/pos_proportions") >>> # apply the component to a document >>> doc = nlp("This is a test sentence.") >>> doc._.pos_proportions