Hier is de uitdaging voor de übergeek (Regexen zijn zooooo de Heilige graal )
Ik ben een shoutbox aan het maken (met ZF, lekker ). Daar is niets aan. MAAR! Ik wil dat de ingevoerde tekst afgezocht wordt naar URI's om die dan te formatten als HTML A-tags.
Ik heb een hele mooie regex gemaakt tot nu toe die voor heel veel gevallen werkt. Maar hij heeft nog enkele kinderziektes ook.
Dit werkt voor heel veel adressen (en ik zou nog IP-adressen en poortnummers kunnen toevoegen. Het jammere is...
* domeinen van emails worden geformat... (dat wil ik niet)
* als je na een punt geen spatie typt, heb je een link. Nuja, das niet zo'n probleem. Dan moeten de mensen maar leren typen
een hoedje plaatsen lost niets op, want dan worden URLS in het midden van de tekst niet geformat.
probeer de volgende eens (nee, ik heb deze niet gemaakt):
{(?<=\b)((?:https?|ftp)://|www\.)[\w.]+[;#&/~=\w+()?.,:%-]*[;#&/~=\w+(-]}i
verder even over jouw 'joekel'.
((http|https|ftp)://)* http://https://ftp://http://www.example.com zal dus ook werken. probeer een plus teken
verder kun je ook non-captured subpattern maken door na het openings haakje "?:" neer te zetten, de inhoud komt dan dus niet meer in een $n terecht.
([\w\d]+\.)*
\d zit volgens mij mij \w inbegrepen
([\w\d]+\.\w{2,4})
die bovenstaande
(/[\w\d\?&=\+\*\.-]*)*
zie bovenstaande
?, +, * en . hoeven binnen square brackets niet gebackslashes worden.
met wat verbeteringen krijg ik de volgende regex:
#((?:(?:http|https|ftp)://)?(?:[\w]+\.)*(?:[\w]+\.\w{2,4})(?:/[\w?&=+*.-]*)*)#si
Wat er nu gebeurt. Bij een emailadres wordt nu het eerste karakter van het domein overgeslagen, maar de rest wordt nog steeds geformat dus alias@domein.extensie wordt alias@domein.extensie
Da's niet echt de bedoeling
Kan iemand mij nog verduidelijken:
* wat de i achteraan doet? de tilde is obviously de delimiter
* Hoe dat truukje (met de @) eigenlijk zou moeten werken?
Ik ben geen Regex-Held, JeXuS is dat, zo te zien aan zijn omschrijving... Alle hulp is welkom though.
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
# Match either the regular expression below (attempting the next alternative only if this one fails)
^ # Assert position at the beginning of a line (at beginning of the string or after a line break character)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
)
( # Match the regular expression below and capture its match into backreference number 1
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?: # Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
ht # Match the characters “ht” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
f # Match the character “f” literally
)
tp # Match the characters “tp” literally
s # Match the character “s” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
ftp # Match the characters “ftp” literally
)
:// # Match the characters “://” literally
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. # Match the character “.” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. # Match the character “.” literally
[a-z] # Match a single character in the range between “a” and “z”
{2,6} # Between 2 and 6 times, as many times as possible, giving back as needed (greedy)
)
(?: # Match the regular expression below
( # Match the regular expression below and capture its match into backreference number 3
/ # Match the character “/” literally
[\w?&=+*.-] # Match a single character present in the list below
# A word character (letters, digits, etc.)
# One of the characters “?&=+*”
# The character “.”
# The character “-”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)*+ # Between zero and unlimited times, as many times as possible, without giving back (possessive)
)
(?<=# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
# Match either the regular expression below (attempting the next alternative only if this one fails)
^ # Assert position at the beginning of a line (at beginning of the string or after a line break character)
|# Or match regular expression number 2 below (the entire group fails if this one fails to match)
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
)
(# Match the regular expression below and capture its match into backreference number 1
(?:# Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?:# Match the regular expression below
# Match either the regular expression below (attempting the next alternative only if this one fails)
ht # Match the characters “ht” literally
|# Or match regular expression number 2 below (the entire group fails if this one fails to match)
f # Match the character “f” literally
)
tp # Match the characters “tp” literally
s # Match the character “s” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
|# Or match regular expression number 2 below (the entire group fails if this one fails to match)
ftp # Match the characters “ftp” literally
)
:// # Match the characters “://” literally
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
(# Match the regular expression below and capture its match into backreference number 2
\w # Match a single character that is a “word character” (letters, digits, etc.)
+# Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\.# Match the character “.” literally
)*# Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(?:# Match the regular expression below
\w # Match a single character that is a “word character” (letters, digits, etc.)
+# Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\.# Match the character “.” literally
[a-z]# Match a single character in the range between “a” and “z”
{2,6}# Between 2 and 6 times, as many times as possible, giving back as needed (greedy)
)
(?:# Match the regular expression below
(# Match the regular expression below and capture its match into backreference number 3
/# Match the character “/” literally
[\w?&=+*.-]# Match a single character present in the list below
# A word character (letters, digits, etc.)
# One of the characters “?&=+*”
# The character “.”
# The character “-”
*# Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)*+# Between zero and unlimited times, as many times as possible, without giving back (possessive)