hdl.handle.net/1773/46827

Preview meta tags from the hdl.handle.net website.

Linked Hostnames

4

Search Engine Appearance

Google

https://hdl.handle.net/1773/46827

An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).



Bing

An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

https://hdl.handle.net/1773/46827

Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).



DuckDuckGo

https://hdl.handle.net/1773/46827

An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).

  • General Meta Tags

    15
    • title
      An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models
    • charset
      UTF-8
    • viewport
      width=device-width,minimum-scale=1
    • cache-control
      no-store
    • Generator
      DSpace 7.6.3
  • Link Tags

    10
    • cite-as
      http://hdl.handle.net/1773/46827
    • describedby
      https://digital.lib.washington.edu/researchworks/signposting/describedby/c9526da8-85f2-49d9-a6ce-8834ce24473f
    • icon
      assets/uwlib/images/favicons/favicon.ico
    • item
      https://digital.lib.washington.edu/researchworks/bitstreams/e4e220f4-a8c3-4562-8b21-388ea7fe21d9/download
    • linkset
      https://digital.lib.washington.edu/researchworks/signposting/linksets/c9526da8-85f2-49d9-a6ce-8834ce24473f

Links

24