Dependency Distance#
The dependency_distance component adds measures of dependency distance to both Doc
, Span
, and Token
objects
under the ._.dependency_distance
attribute.
Dependency distance can be used a measure of syntactics complexity, the greater the distance, the more complex
(Liu 2008, Oya 2011).
The implementation in textdescriptives follows Oya, 2011. We calculate the distances from each token to their dependent, and take the mean of this for calculating the mean dependency distance for spans. We then calculate the Doc level dependency distance by averaging over the sentence-level mean dependency distances.
Please see this issue for how to calculate the dependency distance metric proposed by Liu, 2008 with TextDescriptives.
For Doc
objects, the mean and standard deviation of dependency distance on the
sentence level is returned along with the mean and standard deviation of the proportion
adjacent dependency relations on sentence level.
For Span
objects, the mean dependency distance and the mean proportion adjacent
dependency relations in the span are returned.
For Token
objects, the dependency distance and whether the dependency relation
is an adjacent token is returned.
Usage#
import spacy
import textdescriptives as td
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/dependency_distance")
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")
# all attributes are stored as a dict in the ._.dependency_distance attribute
doc._.dependency_distance
# access span and token level dependency distance in the same way
doc[:3]._.dependency_distance
doc[1]._.dependency_distance
# extract to dataframe
td.extract_df(doc)
text |
dependency_distance_mean |
dependency_distance_std |
prop_adjacent_dependency_relation_mean |
prop_adjacent_dependency_relation_std |
|
---|---|---|---|---|---|
0 |
The world is changed(…) |
1.77524 |
0.553188 |
0.457143 |
0.0722806 |
Component#
- textdescriptives.components.dependency_distance.create_dependency_distance_component(nlp: Language, name: str) Callable[[Doc], Doc] [source]#
Create spaCy language factory that allows DependencyDistance attributes to be added to a pipe using nlp.add_pipe(“textdescriptives/dependency_distance”)
- Adding this component to a pipeline sets the following attributes:
token._.dependency_distance
span._.dependency_distance
doc._.dependency_distance
- Parameters:
nlp (Language) – spaCy language object, does not need to be specified in the nlp.add_pipe call.
name (str) – name of the component. Can be optionally specified in the nlp.add_pipe call, using the name argument.
- Returns:
The DependencyDistance component
- Return type:
Callable[[Doc], Doc]
Example
>>> import spacy >>> import textdescriptives as td >>> nlp = spacy.load("en_core_web_sm") >>> nlp.add_pipe("textdescriptives/dependency_distance") >>> # apply the pipeline to a text >>> doc = nlp("This is a sentence.") >>> # access the dependency distance attributes >>> doc._.dependency_distance