Dependency Distance#

The dependency_distance component adds measures of dependency distance to both Doc, Span, and Token objects under the ._.dependency_distance attribute. Dependency distance can be used a measure of syntactics complexity, the greater the distance, the more complex (Liu 2008, Oya 2011).

The implementation in textdescriptives follows Oya, 2011. We calculate the distances from each token to their dependent, and take the mean of this for calculating the mean dependency distance for spans. We then calculate the Doc level dependency distance by averaging over the sentence-level mean dependency distances.

Please see this issue for how to calculate the dependency distance metric proposed by Liu, 2008 with TextDescriptives.

For Doc objects, the mean and standard deviation of dependency distance on the sentence level is returned along with the mean and standard deviation of the proportion adjacent dependency relations on sentence level.

For Span objects, the mean dependency distance and the mean proportion adjacent dependency relations in the span are returned.

For Token objects, the dependency distance and whether the dependency relation is an adjacent token is returned.

Usage#

import spacy
import textdescriptives as td
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/dependency_distance")
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")

# all attributes are stored as a dict in the ._.dependency_distance attribute
doc._.dependency_distance

# access span and token level dependency distance in the same way
doc[:3]._.dependency_distance
doc[1]._.dependency_distance

# extract to dataframe
td.extract_df(doc)

text

dependency_distance_mean

dependency_distance_std

prop_adjacent_dependency_relation_mean

prop_adjacent_dependency_relation_std

0

The world is changed(…)

1.77524

0.553188

0.457143

0.0722806


Component#

textdescriptives.components.dependency_distance.create_dependency_distance_component(nlp: Language, name: str) Callable[[Doc], Doc][source]#

Create spaCy language factory that allows DependencyDistance attributes to be added to a pipe using nlp.add_pipe(“textdescriptives/dependency_distance”)

Adding this component to a pipeline sets the following attributes:
  • token._.dependency_distance

  • span._.dependency_distance

  • doc._.dependency_distance

Parameters:
  • nlp (Language) – spaCy language object, does not need to be specified in the nlp.add_pipe call.

  • name (str) – name of the component. Can be optionally specified in the nlp.add_pipe call, using the name argument.

Returns:

The DependencyDistance component

Return type:

Callable[[Doc], Doc]

Example

>>> import spacy
>>> import textdescriptives as td
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.add_pipe("textdescriptives/dependency_distance")
>>> # apply the pipeline to a text
>>> doc = nlp("This is a sentence.")
>>> # access the dependency distance attributes
>>> doc._.dependency_distance