UMW Blogs has Cancer of the MS Word Strain

Anyone who is running a pretty big WPMu installation will sooner or later come to hate Microsoft Word in the unlikely event they don’t already. I can’t count the number of times I have seen it break themes over the last two years. And, truth be told, at least 90% of the issues people are having with their blogs are related to the malignant code in Word they unknowingly copy into the text editor. I’ve learned to live with this, and I make sure to try and educate as many people as I can about the evil ways of this software and the power of the MS Word Stripper feature in the WordPress Visual Text Editor.

But today I found a far more insidious and potentially devastating strain of the MS Word text encoding cancer: the RSS feed breaker strain. I got a note from a student that her posts weren’t republishing in the class aggregator blog.  I checked the tag she was using to feed the posts, and she was doing everything correctly.  Then I checked the sitewide feed for her tag, and I realized it wasn’t working with FeedWordPress.  I immediately thought, “God Damn it!” there goes FeedWordPress, and it has been nothing but awesome all semester, my hopes were crushed.  So, I tried a million other things all to no avail, and then I tried validating the feed for that one Sitewide tag, and it came back as broken. Seems like there was a problem on line 292:

Feed validator finds MS Word Strain of cancer

A closer look showed me it was the telltale sign of malignant MS Word code cells eating away at the healthy  XML:
Feed validator finds MS Word Strain of cancer 2

Once I removed the above code from the student’s post, the feed worked fine again. But MS Word has stepped over the line, it has gone too far now. It’s messing with my feeds! It’s fucking with my livlihood!!  It’s as if Bill gates himself showed up at more door and stole food from my kids’ plates….I’ve had enough. I’m done! I declare class war on MS Word, I will seek out and comment on every single blog that copies a MS Word code into a post. I will make sure they know they are potentially destroying not only UMW Blogs, but my family as well. I’ll also inform them that they will be held individually accountable come the day of reckoning. You want a fight MS Word, you got it, I’m taking you down. I kicked BlackBoard’s ass, now you’re next.

This entry was posted in UMW Blogs and tagged , , , , , . Bookmark the permalink.

35 Responses to UMW Blogs has Cancer of the MS Word Strain

  1. James says:

    I honestly think people have to get out of the mindset of working in Word and then doing the copy/paste thing. It doesn’t break the instance of Moodle I’m using right now – it just stays, warts and all.

    So I’ve got all these co-learners getting their posts just right, with several of them pasting the Word XML/style gunk, instead of just saving their edit as a draft right on the site. They can come back any time they want, edit as much as they want and use the RTE if they don’t like plain text.

    I agree about the crap that is Word for Win, but I also think it helps to understand the medium and not do things more laboriously than needed.

    James

  2. RE: “will seek out and comment on every single blog that copies a MS Word code into a post.”

    Dude, you are going to be B-U-S-Y.

    Cheers,

    Bill

  3. Martha says:

    The code that infects ms word documents is repulsive. Period. Forget anti-internet porn legislation, we should be working on laws to outlaw this garbage.

  4. zach whalen says:

    So this is the result of copying and pasting rich text into WP’s WYSIWYG editors? Or is this part of Word’s “Post to Blog” function?

    Drupal doesn’t do rich text automatically, but you can add in various WYSIWYGs. I never have, though, because they were a pain to install or configure. Pedagogically, I like encouraging students to use HTML — that way you are more inclined to focus on content and less on, say, formatting/typography/etc.

    Now, I guess I can add “It messes up RSS” to my reasons for keeping things simpler.

  5. glen says:

    War on Word? I’m in.

    Trying to finish up an academic paper originally started on a wiki.

    Coming near the end and transferring the lot to Word. Forever fighting with margins and footnotes and everything but institutions insist that it isn’t a dissertation unless it is composed in a word processor.

    As if the format communicates the content. I suppose there is an element of that but I should be able to write it on the back of a cigarette pack (or 120 cigarette packs) and the ideas should still be there and work.

    Also fine tuning the formatting for APA format and wondering how many of the APA’s arcane rules have been implemented, not because they enhance understanding or readability but because of the idiosyncrasies of Word.

    And another thing, to those who think they are using ICT because they use Word and email…. naw, you don’t need my rant on that.

  6. Joe says:

    Oh, are you ever singing my song. I think people (not just students) are attached to Word’s spell check and grammar check (try telling them that the spell check in firefox works fine, and the grammar check in Word is not worth using anyway).

    Or (and this is something I’ve heard plenty) they have had the experience too often of trying to write a discussion board post for Blackboard, only to lose it all when they hit “submit” and Blackboard times out or otherwise screws up.

    So it’s not just Word you need to hate on, it’s Blackboard, too. Twice the fun.

  7. Andrea_R says:

    OH MY GOD I am with you, mounted up on the horse and with a load of ammo.

    (says the peace loving Canadian)

    there’s even a “paste from Word” button and that still horks things. Had this SAME ISSUE today, breaking my feed.

    My dear sweet husband was going to write me a plugin to help with this, I must remind him.

  8. Steven Egan says:

    Notepad, it’s your copy and paste best friend. Not only does it not add in strange code, it will also remove it. And for those who want to go a step farther, there’s Notepad ++. It even has a code language option that can help with doing HTML.
    URL: http://notepad-plus.sourceforge.net/uk/site.htm

    As for the other stuff, I use Open Office. It’s free, works with most of MS Office and even has an option to make PDFs.

  9. Brad K says:

    I have this same problem with blogs@psu. It is just an unfortunate fact of life right now.

    I have talked to some people that use word to do all kinds of fancy formatting then want that to actually carry over to the blog. For some it is a feature.

    Can’t please all the people all the time.

  10. Now that I think about it, I’ve actually run into this myself the first time I tried to code something. I moved to mozilla’s old platform for awhile.

    Now, visual text box editors seem ubiquitous. How well do they work in WP?

  11. Brad K says:

    In my perfect world, everyone would author blog posts using markdown.

  12. Does the same problem occur if you past from Open Office documents?
    John

  13. Steven Egan says:

    The point is that if you aren’t using Word for anything it is nice to have a replacement for the other uses, which is where Open Office comes in. I write my blog posts in Notepad.

  14. **puts on flame retardant suit**
    **sets blocks of C4 around the room**
    **attaches primer cable**

    GOOGLE DOCS

    **runs away**

  15. I much prefer Office Live:

    http://office.microsoft.com/en-us/office_live/FX101754491033.aspx

    Because Microsoft totally gets the web.

    Cheers,

    Bill

  16. James says:

    The reason Microsoft ‘gets’ the web so well is that they’ve been leading the charge for open standards for soooooo long.

  17. tee-hee!

    seriously though…..have you tested for differences between versions of Word? I’ve noticed that the .docx format from Office 2007 is not compatible with google docs.

    Students were not able to upload their work in that format. When they were saved in 1998-2003 .doc format, everything was fine.

  18. Reverend says:

    @Ed,
    How did I know you’d be game?

    @James,
    I agree, understanding the medium is important, and I think the first lesson is word is evil 🙂 It seems like MS Word’s code is so heinous as to be Satan’s spawn. It doesn’t only look bad, it kills ideas. I tried pushing people to write in text editors or on the blog, but it is a non-starter, people glaze over. Word has become synonymous with writing, it has colonized the space of word processing to such a degree as to be oxygen.

    @Martha
    100% agreed!

    @Zach
    From WP’s WYSIWYG, and it is ugly.
    On another point, the fact that Drupal’s visual editor is so hard to install is a major flaw in that application. And while having students write posts in HTML may have some merits, I think 90% of UMW faculty and students would rightly laugh at me if I suggested it. Writing online should be far more seamless than it currently is, and that has everything to do with Word’s marketshare.

    When I think about it, MediaWiki’s lack of a visual text editor is a crime, who wants to do mark up? It’s so 1995!

    @Glen
    I love the rants, keep em coming. This post was cathartic for me, I actually feel like dancing now!

    @Joe Throwing BlackBoard into the logic o this post just gets me giddy.

    @Andrea_r Ron couldn’t write that code soon enough 🙂

    @Brad,
    What is Markdown? 🙂

    @Steven,
    I’m with you, though I even use a text editor I paid 60 bucks for in Textmate, it is one of the few programs I would, and have, paid for. The others are Transmit for FTPing, Bioshck and L4D 🙂

    @Bill,
    Our new CIO just sent our students to Microsoft Mail, and I gues Office live may be an unfortunate result of that marriage. Though, I also wonder how many students will prefer Groupwise to MS Live Mail? Perhas more than a few, but they will utlimately use whatever they want, and we will play the game of “provider.”

    @John,
    I’m not sure, I don;t use Open Office, so I couldn’t say. Anyone? Bueller?

    @Peter,
    Yeah, that is still the case, the whole docx scandal is yet another pet peeve, but I actually do all my work processing in either textmate or Google docs, so I am relatively free until I get an email attachment with a word docx because someone really hates me 😉

    Finally, wow, I really did not expect this post to generate such a groundswell of comments, I may be changing the focus of this blog to Microsoft Word bashing and monetize the son of a bitch 😉

  19. zach whalen says:

    And while having students write posts in HTML may have some merits, I think 90% of UMW faculty and students would rightly laugh at me if I suggested it. Writing online should be far more seamless than it currently is, and that has everything to do with Word’s marketshare.

    Nothing wrong with requiring them to learn a little markup. We’re just talking links, bold, emphasis, maybe a list — nothing structural. HTML is easy. WYSIWYG mystifies the process of influencing how your document looks online, and it makes markup decisions for you that you may not agree with. That’s the problem with Word, too. The effect is that content creators don’t have any path toward actually designing their content and taking meaningful ownership of their online discourse. Digital rhetoric isn’t a matter of filling in the blanks of your favorite template — whether that’s Word, WordPress, or Drupal. That attitude keeps design firmly in the realm of “computer stuff” or “stuff that computer people know but that I don’t have to learn because I’m not a computer person.” (This is a sentiment I hear expressed in my classes, and it’s false. We’re all computer people.) So when it comes time to try and influence that design or do something interesting with it, the WYSIWYG-dependent doesn’t have a clue. So they call up a “computer person” to do it for them — again centralizing the power of design under a controlling authority.

    (Also, the WYSIWYG-dependent won’t get that).

  20. zach whalen says:

    Aah, it didn’t come through. I meant to conclude that last post with an “end tag” joke, but apparently it parsed as HTML. Let me try again:

    </rant>

  21. Reverend says:

    UNcle!

    That attitude keeps design firmly in the realm of “computer stuff” or “stuff that computer people know but that I don’t have to learn because I’m not a computer person.” (This is a sentiment I hear expressed in my classes, and it’s false. We’re all computer people.)

    Well said, I really like the way you frame this and I have to agree, we’re all computer people now whether we want to acknowledge that fact or not. I’m stealing that line 😉
    </capitulation>

  22. Alex Ragone says:

    Jim,

    I use Windows Live Writer on my PC and love it. It works great with my wordpress installs. Microsoft absolutely does have some great parts of their company.

    I understand your pain and have blogged about it as well. Word processing in word and pasting to the web is a nightmare.

    Best, Alex

  23. Luke says:

    Following up on this last point, and tying back to Cog Dog’s Drupal v WP post that I never got around to commenting upon… one of my subterranean arguments for WordPress in the university is that it supports and encourages baby steps through the demystification of many things web. While I wouldn’t necessarily agree with Zach’s contention that users of systems such as UMW Blogs or Blogs@Baruch should be required to learn markup — it is very hard for lots of people to get their head around the concept of markup, and I still want their voices to be heard rather than excluded — for those who want to take some initial steps towards learning how to work with code, WP is the perfect, uh, gateway drug.

    I’m an example of this… I knew basic html and css a few years ago when I began playing with WP, but, just like I never easily picked up the rules of French or Spanish when I studied them, I had trouble mastering syntax. Still do. But through hacking WP, which has such elegant code– “code is poetry” and all that– I’ve learned enough to be able to work more comfortably with PHP, CSS, and MYSQL. I don’t think I’ll ever be writing from scratch… but I can now quickly determine the logic of a site or a page and alter it to my needs. WP combined with a set of immediate challenges helped me do this. That knowledge is portable to other systems.

    I agree, wholeheartedly, that Word’s hold on our writing is negative and at times debilitating, and I appreciate Zach’s statement about the larger implications of WYSIWYG– there’s a lot to think about there. That said, I also think we need to keep the barriers to entry low and deal with the implications; this is especially necessary at my college.

    Go ahead. Call me a hippie.

  24. James says:

    I’ve been guiding ppl through building sites and managing content for years and, watching the folks at the post-sec I work with now, I can tell you that ‘most’ non-web folks aren’t the slightest bit interested in learning markup.

    “Oh my, no, that’s far too technical,” is the kind of response it often elicits. If the person is old enough to remember, you can liken it to reveal codes in old word processing programs.

    You really want their eyes to glaze over? Just tell ’em that HTML is a direct descendant of SGML.

  25. @james add to that list ZFORMAT, UOF and LaTeX….then tell them they might be available via XML, a restricted form of SGML.

    That should keep them from posting anything ever.

  26. James says:

    @peter tee hee hee … when I’m discussing with communications folks ‘how’ all this social web stuff works, I have to be very careful not to veer into how the semantic, social web actually works. Any talk of XML or, for example, microformats and the discussion is pretty much over 🙂

    As an aside, you wouldn’t believe all the funkification of twitter, blogging and vid sharing we’re putting together for them in student recruitment, but at least it IS about trying something. I feel a post about post-sec social media strategy coming on … have for a while.

  27. Steven Egan says:

    On forums it’s interesting to note that people learn a markup language to format their content. Some only use the basic tools that generate the BBcode for them, while other like myself just type it out. If you do the same with basic HMTL tags, it would make sense. While you can just put in the text, you can also add in formatting. Color, size, underline, bold, italics and more are just extra, but people will commonly learn to do it to give emphasis. I actually learned the code for some of the emoticons for one board and will just type those out as well. I think it’s a good way to introduce the concept without requiring people to learn it. It’s their choice, but they are still likely to learn some of it if the editors are like those of BBforums.

  28. Ed Webb says:

    @Steven – excellent points, and very much in the spirit of Edupunk – learn the tools you need to do what you want to do. I like Luke’s idea of WPMU as gateway drug, but there are others, as you (Steven) point out. Of course, Zach is completly right about digital rhetoric – I’m barely literate in this stuff, but at least I know how little I know and why it matters that I learn more.

    @Luke – let me know when you write that post about post-sec social media strategy.

  29. Gardo says:

    I agree with Zach that if we focus too narrowly on ease of use, we lose control of our own authorship and we lose opportunities for depth of understanding. We are certainly all computer people.

    But at the same time, contradicting myself, I agree that the tools must be easy to use if we’re to get significant adoption. It’s scary and bewildering enough for many faculty (and not a few students) to try to get their heads around blogging (trackbacks, comment moderation, spam, not to mention “what am I writing and who is it for?” etc. etc.) without having to teach them even basic HTML at the same time.

    So there’s a contradiction in me that’s getting fiercer, not more resolved. Hearing Alan Kay complain that the browser broke WYSIWYG and that no one knew enough to talk in an informed and broad way about that choice–and he said this with Doug Engelbart sitting right there in the front row, Doug Engelbart who thinks “ease of use” is Public Enemy No. 1 when it comes to computers and people–made me even more unhappily unresolved in my mind about all of this.

    My next big unresolution has to do with why we need folks to adopt simple tools in the first place. Shouldn’t we just get them to suck it up and go for a little beneficial trip up the learning curve? And the only answer I have so far is this: network effects. If we get more people to adopt, we get better network effects, and better emergent phenomena as a result. The trick is to get people to adopt *real* blogging and not the fake kind walled off in an LMS. An LMS is almost designed as an anti-network-effect machine.

    Maybe not even *almost*.

    Without the network effects, we don’t get YouTube, the Blogosphere, Wikipedia, Twitter, etc. etc. None of those require markup, nor should they in my view. I’m daily impressed by how ridiculously easy it is to upload video to YouTube. But it’s plugged into the network, so it gets the network effects, and that inspires the truly curious and driven to make cool things, which then drive more adoption and more network effects.

  30. Steven Egan says:

    @ Ed Webb: Thanks.

    @ Gardo: That’s part of what I was talking about. By not requiring people to learn HTML or some other markup we make the tool accessible to non-techs. By having the buttons to inject the markup into their text, we make it possible to be expressive, add links, add images and so on while exposing them to the markup in a simple way.

    That make the tool simple to use and increases adoption, as the techs and non-techs can all get into using it. With the adoption in place the network effects come into play on people wanting to learn to do the really cool stuff they see others do. Once they can see the code, and are familiar with the code, it’s a matter of informal learning. The community and tool help people learn.

  31. Joss Winn says:

    I’m joining this brigade. I’ve spent much of my day ‘de-bugging’ CommentPress and it turns out that the problem was all the M$ Word junk that had been pasted into content. Having cleaned it out, I can see there’s nothing wrong with the theme.

    Makes me wonder what happens when someone posts via Atom or RPC-XML from Word 2007 or OpenOffice. I’m keen to promote Atom and XML-RPC remote publishing but not if it’s going to give me headaches like this. Has anyone tested any Atom/XML-RPC clients extensively? I’m about to…

Leave a Reply

Your email address will not be published. Required fields are marked *