This shouldn’t annoy me…

by

gender_neutral_toilet_sign_gu…but it does.

There’s a web site Gender Analyzer that looks at a blog site and decides if it was written by a man or a woman. And by “decides” I means uses an algorithm.

On a whim I checked 360. It said it was written by a man. OK, that was a freebie because we have multiple authors so either answer could be viewed as correct. Then I started checking other math sites. Teaching College Math Technology? Written by a man. (Surprise, Maria!) Let’s Play Math? Written by a man. (Surprise, Denise!). Continuities? Written by a man. (Surprise, Jackie!) Math Trek? Written by a man. (Surprise, Julie!) Every math blog I tried was claimed to be written by a man, no matter who it was written by. And no, it doesn’t only pick men – non-math blogs are claimed to be written by either women or men.

OK, it’s just an algorithm. And it was just written for fun, not for any scientific claim, so I can respect that. And it doesn’t claim to be super-accurate — indeed, they publish a little poll at the side asking if they were correct and it’s only accurate 56% of the time. But still, the whole math=man connection in the algorithm is bugging me, probably more than it ought.  I’d feel better with equal inaccuracy by gender.

10 Responses to “This shouldn’t annoy me…”

  1. mathmom Says:

    It got my math blog wrong too.

    For what it’s worth, this isn’t an intentional human-built-in bias. It’s using statistical methods, and 2000-blog training sample. Of course I don’t know how they constructed their sample, but it stands to reason that they didn’t intentionally bias it by giving math sites written only by men. It’s probably reasonable to conclude from most random samples that more men that women talk about math.

    And with an accuracy of 56% it’s barely better than a coin toss, in any case!

  2. Barry Leiba Says:

    “Math is hard!”
    — Barbie

  3. Ξ Says:

    And yet Barbie wanted to be a doctor too! [I remember calling them and complaining about her math — the person at the other end sounded really bored, like they’d been getting a lot of those calls.]

    mathmom: I was thinking that another possibility is that it has to do with writing formally versus informally. Maybe the algorithm associates that with men? I wonder if it would perform better if there was a random element to it [so if it was 80% sure it was a man, `1 out of 5 times it would still say it was a woman].

  4. Jason Dyer Says:

    I don’t think it has anything to do with math. I’ve tried a *lot* of blogs (including some you might think of as stereotypically female, and none having to do with math) written by women and only one came out as written by a woman.

    Oh, and it thought one of them wasn’t written in English.

  5. TwoPi Says:

    I disagree, Ξ. I think it should bother us. [Disclaimer: I haven’t visited that site, I haven’t read their documentation, I don’t know their intentions. But why should ignorance stop me?🙂 ]

    It bothers me, not specifically because of the implicit (potential) math=male link, but on a much more general issue. The creation of such an algorithm involves an explicit assumption that men and women write differently, and that gender is tied to ability to write in a particular style or genre.

    That premise seems a short step away from claiming that women can’t write technical manuals, or men can’t write poetry. Or [fill in your favorite inflamatory absurdity].

    THAT bothers me.

  6. Jackie Ballarini Says:

    I am surprised. And bothered. Off to do a bit of investigating… thanks for pointing this out.

  7. Jackie Says:

    I sent an email to the contact on the site. We’ll see how they respond.

  8. Batman Says:

    Language Log has discussed the (lack of) differences in male and female communication (one post here, but there are plenty more if you dig around). I’ll email them and see if they have any insight.

  9. Genderanalyzer / Jon Says:

    Hi,

    We received an e-mail from Jackie that pointed me to this blog post, so I might just answer your questions here.

    All blogs were collected automatically from blogspot using, what I thought of common keywords (that, and, what etc). The two samples are about the same size. After we had collected them we stripped them on html and train a classifier at uclassify.com. We had no intention whatsoever to make math blogs manly. A classifier works by counting words in each class (male and female), from the word frequencies it’s possible to calculate the probability of a previously unseen document either belongs to the male of female class.

    So, we have not explicitly written any rules such as if (math) then male; all comes from the inference over the 2000 blogs.

    Now, I think our training data is biased in the way you describe and we are thinking of collecting new training data that is not only from blogspot (collecting from more sources such as wordpad, typepad may give better result or at least be more representative). Actually our result is in line with those of Moshe Koppel et al. who also collected their data from blogspot.

    If you are interested – I have published the classifier and you can test it on individual words and sentances here: http://www.uclassify.com/Browse.aspx I think you will find it very stereotypical.

    Just remember that we just did this for fun – we don’t claim it to be very accurate (57% it seems). If you are interesting in reading some real research on the subject I recommend prof. Susan Herrings “Gender and genre variation in weblogs” (who was kind to point out “blogs on serious topics external to the author tend to be identified as male, and blogs on personal topics such as the author’s life tend to be identified as female”. http://ella.slis.indiana.edu/~herring/jslx.pdf

    Also Moshe Koppels article:
    http://www.cs.biu.ac.il/~koppel/papers/springsymp-blogs-07.10.05-final.pdf

    Let me know if you have more questions, and thanks for your interest =)

    /Jon

  10. Ξ Says:

    Thanks Jon! The articles you pointed to are interesting. My sense in reading them are that there are a number of apparently confounding factors, including age of the author and genre of the blog. For example, if I’m understanding this correctly, Moshe Koppel’s article mentioned the frequent use of “the” as being characteristic of male blogs, but I’m willing to bet that “the” appears rather frequently in math blogs.

    I think what is bothering me is that my sense is that if [more than 50%] of blogs of a certain type are written by men, then any blog like that is going to be identified 100% of the time as being written by a man. And even though I know that “of a certain type” isn’t directly related to mathematics, but to the frequency of certain words, at the moment it reveals a bias that aligns with certain stereotypes. [I had assumed that the bias was in the algorithm, but it sounds from what you wrote that it might (also) be related the the training data.]

    On the other hand, TwoPi, I’m not sure I agree with your statement, “The creation of such an algorithm involves an explicit assumption that men and women write differently, and that gender is tied to ability to write in a particular style or genre.” I think that men and women might well write differently *on average*, although I don’t tie that to ability but to personal preference. The trap to me is that even if there is a difference that is statistically significant, that doesn’t mean that the differences are large or that you can draw many conclusions about an individual.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: