Jay Taylor's notes
back to listing indexRegular expression to match line that doesn't contain a word?
[web search]
I know it's possible to match a word and then reverse the matches using other tools (e.g. Input:
Code:
Desired output:
|
|||||||||||||||||||||||||||||||||
protected by Community♦ Oct 8 '11 at 12:34This question is protected to prevent "thanks!", "me too!", or spam answers by new users. To answer it, you must have earned at least 10 reputation on this site (the association bonus does not count). |
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:
The regex above will match any string, or line without a line break, not containing the (sub) string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible. And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing
or use it inline:
(where the If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class
ExplanationA string is just a list of
where the So, in my example, every empty string is first validated to see if there's no As you can see, the input |
|||||||||
|
Note that the solution to does not start with “hede”:
is generally much more efficient than the solution to does not contain “hede”:
The former checks for “hede” only at the input string’s first position, rather than at every position. |
|||||||||||||||||||||||||||||||||
|
If you're just using it for grep, you can use ETA Oh, rereading the question, |
|||||||||||||||
|
The given answers are perfectly fine, just an academic point: Regular Expressions in the meaning of theoretical computer sciences ARE NOT ABLE do it like this. For them it had to look something like this:
This only does a FULL match. Doing it for sub-matches would even be more awkward. |
|||||||||||||||||||||||||||||||||
|
Explanation:
|
|||||||||
|
Here's a good explanation of why it's not easy to negate an arbitrary regex. I have to agree with the other answers, though: if this is anything other than a hypothetical question, then a regex is not the right choice here. |
|||||||||||||||||||||
|
If you want the regex test to only fail if the entire string matches, the following will work:
e.g. -- If you want to allow all values except "foo" (i.e. "foofoo", "barfoo", and "foobar" will pass, but "foo" will fail), use: Of course, if you're checking for exact equality, a better general solution in this case is to check for string equality, i.e.
You could even put the negation outside the test if you need any regex features (here, case insensitivity and range matching):
The regex solution at the top may be helpful, however, in situations where a positive regex test is required (perhaps by an API). |
||||
Not regex, but I've found it logical and useful to use serial greps with pipe to eliminate noise. eg. search an apache config file without all the comments-
and
The logic of serial grep's is (not a comment) and (matches dir) |
|||||||||||||||
|
BenchmarksI decided to evaluate some of the presented Options and compare their performance, as well as use some new Features. Benchmarking on .NET Regex Engine: http://regexhero.net/tester/ Benchmark Text:The first 7 lines should not match, since they contain the searched Expression, while the lower 7 lines should match!
Results:Results are Iterations per second as the median of 3 runs - Bigger Number = Better
Since .NET doesn't support action Verbs (*FAIL, etc.) I couldn't test the solutions P1 and P2. Summary:I tried to test most proposed solutions, some Optimizations are possible for certain words.
For Example if the First two letters of the search string are not the Same, answer 03 can be expanded to
But the overall most readable and performance-wise fastest solution seems to be 05 using a conditional statement or 04 with the possesive quantifier. I think the Perl solutions should be even faster and more easily readable. |
||||
With negative lookahead, regular expression can match something not contains specific pattern. This is answered and explained by Bart Kiers. Great explanation! However, with Bart Kiers' answer, the lookahead part will test 1 to 4 characters ahead while matching any single character. We can avoid this and let the lookahead part check out the whole text, ensure there is no 'hede', and then the normal part (.*) can eat the whole text all at one time. Here is the improved regex:
Note the (*?) lazy quantifier in the negative lookahead part is optional, you can use (*) greedy quantifier instead, depending on your data: if 'hede' does present and in the beginning half of the text, the lazy quantifier can be faster; otherwise, the greedy quantifier be faster. However if 'hede' does not present, both would be equal slow. Here is the demo code. For more information about lookahead, please check out the great article: Mastering Lookahead and Lookbehind. Also, please check out RegexGen.js, a JavaScript Regular Expression Generator that helps to construct complex regular expressions. With RegexGen.js, you can construct the regex in a more readable way:
|
||||
FWIW, since regular languages (aka rational languages) are closed under complementation, it's always possible to find an regular expression (aka rational expression) that negates another expression. But not many tools implement this. Vcsn supports this operator (which it denotes You first define the type of your expressions: labels are letter ( In Python:
then you enter your expression:
convert this expression to an automaton:
finally, convert this automaton back to a simple expression.
where |
|||||||||
|
with this, you avoid to test a lookahead on each positions:
equivalent to (for .net):
Old answer:
|
|||||||||||||||||||||||||||
|
Here's how I'd do it:
Accurate and more efficient than the other answers. It implements Friedl's "unrolling-the-loop" efficiency technique and requires much less backtracking. |
||||
If you want to match a character to negate a word similar to negate character class: For example, a string:
Do not use:
Use:
Notice
|
|||||||||
|
The OP did not specify or Tag the post to indicate the context (programming language, editor, tool) the Regex will be used within. For me, I sometimes need to do this while editing a file using
If I am looking to retain all lines that Do NOT contain the string
Now you have the original text with all lines containing the string If I am looking to Do Something Else to only lines that Do NOT contain the string
|
||||
Through PCRE verb
This would completely skips the line which contains the exact string Execution of the parts: Let us consider the above regex by splitting it into two parts.
PART 1 Regex engine will start its execution from the first part.
Explanation:
So the line which contains the string PART 2
Explanation:
|
||||
It may be more maintainable to two regexes in your code, one to do the first match, and then if it matches run the second regex to check for outlier cases you wish to block for example ^.(hede). then have appropriate logic in your code. Ok, I admit this is not really an answer to the posted question posted and it may also use slightly more processing than a single regex. But for developers who came here looking for a fast emergency fix for an outlier case then this solution should not be overlooked. |
||||
The TXR Language supports regex negation.
A more complicated example: match all lines that start with
Regex negation is not particularly useful on its own but when you also have intersection, things get interesting, since you have a full set of boolean set operations: you can express "the set which matches this, except for things which match that". |
||||
Your Answer
Not the answer you're looking for? Browse other questions tagged regex regex-negation or ask your own question.
asked |
7 years ago |
viewed |
1714078 times |
active |
Linked
Related
Hot Network Questions
- Practicality of chainsaws as weapons in the medieval-ages and how to improve them?
- Program to find the largest odd number among three variables
- Is "went out like stink, died like a pig" just an unfortunate choice of words?
- Celebrating Halloween in the USA (Austin, TX)
- Is No Man's Sky actually multiplayer?
- Assassination in an office (again)
- What is the strategy to beat Dragonite?
- Regional Feats Prereqs
- Hash symbols (#) turned into pound symbols (£) after catting a binary
- Can either Colossus or Negasonic Teenage Warhead actually fly a plane?
- Copy range up to delimiter
- How can we justify basic research funding?
- How to get rid of new default Version 11 Input/Output fonts -> and keep the traditional Courier font
- Are lights a worthwhile investment?
- Kaon oscillations
- Function expressed the sum of an odd and even function
- Do an action only if a condition is met in all iterations of a loop
- If time travel is possible in the future, no matter how distant, why haven't they come back to tell us?
- Is it correct to say that H2SO4 is an acid in this reaction?
- Are "whores" and "horse" homophones?
- What exactly is a photon?
- Examples of problems that are easier in the infinite case than in the finite case.
- Automatic box expander
- Why does Neumann think cryptography isn't the solution?
Technology | Life / Arts | Culture / Recreation | Science | Other | ||
---|---|---|---|---|---|---|