I need a RegExp parser

Dr_Grip

Made from concentrate
Joined
Jul 8, 2008
Messages
15,215
Location
HEL
Car(s)
79 Opel Kadett|72 Ford Country Sedan|03 Volvo XC70
Hi guys, I just got sent an XML-based document which I try to get ready to import into another software. In oder to do so, I want to remove all Elements that have the attribute Style="Strikeout". For example:
Code:
<Text AdornmentStyle="0" Background="#FFFFFFFFFFFF" Color="#000000000000" Font="Courier Final Draft" RevisionID="0" Size="12" Style="AllCaps+Strikeout">HINTERM</Text>
The whole thing, the tags as well as the text between them, has to go. As has anything else that has the Strikeout attribute. Please help me, gods of RegExp (narf?)!
 
If you need a program to quickly run regex-based replacements easily and graphically, take a look at Notepad++.

If you're sane and do that on a commandline, here's an sed call that should do it:

sed s/<[^>]*Strikeout[^>]*>[^<]*<[^>]*>//g filename

That will print out the result on screen rather than replace inline. Either append "> outputfile" or set the -i switch.

As for the expression, it looks for the following:
- an opening tag bracket
- anything that's not a closing bracket followed by Strikeout (case-sensitive!) followed by anything that's not a closing bracket
- a closing tag bracket
- anything that's not an opening bracket
- an opening bracket
- anything that's not a closing bracket
- a closing bracket

I've made some assumptions:
- any tag containing "Strikeout" between < and > will go
- the Strikeout-tags do not contain nested tags or CDATA-sections

Depending on your console you may have to escape or quote stuff :dunno: also, not tested, no warranty, yada yada :tease:
 
Once again, narf saves the day! Thank you so much!
 
Just out of curiosity,why wouldn't you use an XML parser for this?

Regexes will work of course, at least most of the time, but they seem like the wrong tool for a job that is essentially a core function of an XML parser.
 
Top