Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.

Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:

function findUrls(searchText){
    var regex=???
    result= searchText.match(regex);
    if(result){return result;}else{return false;}
}

The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:

Split the string (searchText) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.

For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).

The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.

I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.

From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!

eg. http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// etc...and any derivation thereof with a query string should be returned...

Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
643 views
Welcome To Ask or Share your Answers For Others

1 Answer

I just use URI.js -- makes it easy.

var source = "Hello www.example.com,
"
    + "http://google.com is a search engine, like http://www.bing.com
"
    + "http://ex?mple.org/foo.html?baz=la#bumm is an IDN URL,
"
    + "http://123.123.123.123/foo.html is IPv4 and "
    + "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.
"
    + "links can also be in parens (http://example.org) "
    + "or quotes ?http://example.org?.";

var result = URI.withinString(source, function(url) {
    return "<a>" + url + "</a>";
});

/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://ex?mple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes ?<a>http://example.org</a>?.
*/

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...