The Grumpy Editor mangles some web pages
[Posted June 29, 2005 by corbet]
As long as there have been web pages, there have been web page annoyances.
Back in the early days, it was
<blink> tags. Blinking text
seems awfully archaic and old-fashioned in these days of flash and
javascript atrocities, but we had to manage to get annoyed at the
technology that was available at the time; you youngsters won't understand.
Back in those days, the technology for annoyance mitigation were also
limited; we had to rely upon special-purpose web proxy
processes and other unwieldy hacks.
LWN looked at greasemonkey
back in March. Greasemonkey is a powerful tool, but it requires that the
user write scripts to perform the edits; it's also a heavyweight tool for
one-time page tweaks. So your editor decided to look at some of the other
tools which are available. Thanks to the Firefox plugin architecture,
there is a wealth of tools out there for would-be page manglers.
Your editor's first stop was aardvark, an extension which,
unlike most others, is not found on the mozdev.org site. Aardvark is a
tool optimized for examination of web pages, and the deletion of items from
those pages.
Aardvark lurks during normal browsing, only making itself visible when the
"start aardvark" item is chosen from the right-button context menu.
Thereafter, the HTML element containing the pointer will be highlighted;
picking the interesting portion of the page is simply a matter of moving
the pointer there and, possibly, using "w" to "widen" the scope to
larger, containing elements. Once the element of interest is chosen, it is
a matter of a keystroke to remove it from the page, blank it out, perform
some simple formatting changes, or view the HTML source. The source viewer
is a nice touch; it enables easy examination of a specific part of a page
which might otherwise be hard to find among the kilobytes of junk that
modern editors and content management systems dump into pages.
What aardvark lacks, first of all, is any sort of help facility. The user
must simply memorize a dozen or so keystrokes, or keep a pointer to the
help information available. There is also no way to make changes
permanent. So aardvark can be useful for one-time tweaks (useful, for
example, to print a page without wasting sheets of paper on unrelated
junk), and as a nicer sort of "view source" function. It is not helpful
for making permanent changes, however.
Platypus is an on-the-fly editor
which is very similar to aardvark, but which appears to be somewhat more
advanced in some areas. For starters, platypus has a help screen for
people who cannot remember the keyboard shortcuts. The selection of HTML
elements is very similar to aardvark, except that the arrow keys are used:
Platypus explicitly recognizes the tree structure of web pages, and uses
arrows to move up and down the tree, or to "sibling" elements (stepping
across columns in a table, for example).
Platypus can do a number of things which aardvark can't. It can relocate
elements on the page, should you like things organized in a different way.
So it can be used to rearrange navigation links, or put seldom-useful stuff
at the bottom of the page. There is a simple CSS editor which can be used
to reformat things or change their colors. And, for advanced users, there
is a regular expression-based HTML editor which can make no end of
changes.
Perhaps the key feature behind platypus, however, is used at the end: once
you have mangled a web page to your satisfaction, a keystroke turns all of
the edits into a greasemonkey script. Install that script, and the changes
become permanent.
The biggest down side to platypus, perhaps, is that its source viewer is
nearly unusable. Instead of aardvark's nice, hierarchical display,
platypus gave your editor a window with everything in one long line of
text.
The final stop on this tour is rip,
which stands for "remove it permanently." As its name would suggest, rip has a
very specific mission: allow the user to select web page elements, rip them
out of the page, and never see them again. It cannot perform all of the
functions of either aardvark or platypus, but it is effective at what it
does do.
Rip's core interface is simple: put the pointer over an undesired web
element, put down the right button, and select "remove it permanently" from
the resulting context menu. The affected area will be briefly highlighted
when the menu item is hit, but before it is selected. Rip could benefit
from the more developed mechanisms for selecting elements seen in aardvark
and platypus; it can be hard to communicate to rip exactly what you want to
get rid of.
First-time users may be surprised to learn that rip, when installed,
includes "rips" for several popular sites, including Slashdot, BoingBoing,
and Wired. There is a wiki
page available to host rips created by other users; it probably would
be best to put all of them there, and not mess with specific pages
without the user's acknowledgment. That said, rip seems like a useful
tool for quick simplification of web pages.
Which tool would a grumpy editor, made even grumpier by the user-hostile
features of certain web sites, use? Rip is a lightweight tool for quick
removal of unwanted web cruft, but it lacks flexibility and ease of use.
The future in this space almost certainly belongs to the combination of a
powerful script-based facility (like greasemonkey) combined with a nicer
front end - platypus, for now. With tools like these, control of the web
is moving closer to where it belongs: with the people actually trying to
read all that content.
(
Log in to post comments)