gender - Difference between female and male usage
What explains the difference of a de facto larger frequency of vowels of one writer compared to another? In the statistics data I examined, a vowel had higher probability in the text from the female Swedish authoer compared to a Russian male author. The statistics I cite compared the male and female use of consonants and vowels indicated that the probability of next sound being a vowel was much higher for the Swedish female author compared to a Russian male author. The probability of next sound being a vowel and the probability of next sound being a consonant could be explained to vary by style, by book, by author, by language and/or by gender (male/female)
Making statistics on material either women or men wrote, I hypothesize that there are more vowels when the writer is a female and more consonants when the writer is male. Are there any evidence for or against my notion? Did anybody make a study like that? Does it have any purpose besides being a "fact"? A purpose I can think is revealing forgery when a man for instance in a text pretends to be a woman or vice versa, a woman writing to you pretending to be a man then according to patterns you could get an indication.
Edit: I changed it to a real hypothesis about how sounds change since we may wish to compare phoneticallly if doing a real study that could indicate for instance whether the next message is from a man or a woman.
Edit: The statistics say there is a statistical difference between 2 books specified as the markov matrix for if the next sound is a vowel or a consonant given that the current value is a vowel or a consonant.
Answer
Just out of curiosity I have done some quick statistics.
I downloaded the following books from Project Gutenberg
Men writers
- Alice's Adventures in Wonderland by Lewis Carroll
- Adventures of Huckleberry Finn by Mark Twain
- Moby Dick, or, the whale by Herman Melville
- The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle
- The Picture of Dorian Gray by Oscar Wilde
- Paradise Lost by John Milton
- The Works of Edgar Allan Poe — Volume 1 by Edgar Allan Poe
- War and Peace by graf Leo Tolstoy
- Dracula by Bram Stoker
- Treasure Island by Robert Louis Stevenson
Women writers
- Secret Adversary by Agatha Christie
- Jane Eyre by Charlotte Brontë
- Frankenstein by Mary Wollstonecraft Shelley
- Pride and Prejudice by Jane Austen
- Sarah Orne Jewett
- Ramona by Helen Hunt Jackson
- Home Influence by Grace Aguilar
- Middlemarch by George Eliot
- A Season at Harrogate by Mrs. Hofland
- Wuthering Heights by Emily Brontë
After removing the common Project Gutenberg header, I've read the files in R, split them into characters and let it count vowels and consonants.
I had a total of 8725700 characters for men and 11468186 for women
Here's a graph with the ratios consonants/vowels1 calcolated per book (showing mean +/- standard deviation)
There is no statistical significance in the two groups (p=0.89, t-test)
EDIT
I played some more with the data and I got this bargraph of usage of the single letters.
Again, you can see no major differences between men and women writers
EDIT2: I repeated the analysis with 10 books per group. I would say that there is definitely no difference
1 I considered a, e, i, o and u as vowels, the result does not grossly change including y.
Comments
Post a Comment