I need a RegExp parser

Dr_Grip · Jul 12, 2013

Hi guys, I just got sent an XML-based document which I try to get ready to import into another software. In oder to do so, I want to remove all Elements that have the attribute Style="Strikeout". For example:

Code:

<Text AdornmentStyle="0" Background="#FFFFFFFFFFFF" Color="#000000000000" Font="Courier Final Draft" RevisionID="0" Size="12" Style="AllCaps+Strikeout">HINTERM</Text>

The whole thing, the tags as well as the text between them, has to go. As has anything else that has the Strikeout attribute. Please help me, gods of RegExp (narf?)!

narf · Jul 12, 2013

If you need a program to quickly run regex-based replacements easily and graphically, take a look at Notepad++.

If you're sane and do that on a commandline, here's an sed call that should do it:

sed s/<[^>]*Strikeout[^>]*>[^<]*<[^>]*>//g filename

That will print out the result on screen rather than replace inline. Either append "> outputfile" or set the -i switch.

As for the expression, it looks for the following:
- an opening tag bracket
- anything that's not a closing bracket followed by Strikeout (case-sensitive!) followed by anything that's not a closing bracket
- a closing tag bracket
- anything that's not an opening bracket
- an opening bracket
- anything that's not a closing bracket
- a closing bracket

I've made some assumptions:
- any tag containing "Strikeout" between < and > will go
- the Strikeout-tags do not contain nested tags or CDATA-sections

Depending on your console you may have to escape or quote stuff :dunno:

also, not tested, no warranty, yada yada :tease:

Dr_Grip · Jul 12, 2013

Once again, narf saves the day! Thank you so much!

Hatmouse · Aug 18, 2013

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags Relevant.

PacketCollision · Aug 19, 2013

Just out of curiosity,why wouldn't you use an XML parser for this?

Regexes will work of course, at least most of the time, but they seem like the wrong tool for a job that is essentially a core function of an XML parser.

I need a RegExp parser

Dr_Grip

Made from concentrate

narf

Sgt. Maj. Buzzkill

Dr_Grip

Made from concentrate

Hatmouse

Well-Known Member

PacketCollision

Server Admin/Crasher