If you save a HTML page to a text file, or as a string in memory, you are likely to get the relative path contained in any IMG or HREF tags
<H1>Hello from Test Page</H1>
When you then load you saved HTML page you will not see the image because the HTML is looking for TestImage1.jpg in the images folder which doesn't exist. All that exists is your saved HTML text file.
So we need to parse the HTML and prefix the missing server path to the src tag in the HTML.
The most efficient way to achieve this is to use the power of Regular Expressions, but I'm no expert with RegEx's so after trawling the Internet looking for a suitable RegEx example, rather than read a book ;), I finally found the correct expression at code.nontalk.com.
The pattern: "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>"
The Match Evaluator: "<$1$2=\"" + absoluteUrl + "$3\"$4>"
The example method:
public static String ConvertRelativePathsToAbsolute(String text, String absoluteUrl)
String value = Regex.Replace(text, "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>", "<$1$2=\"" + absoluteUrl + "$3\"$4>",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
// Now just make sure that there isn't a // because if
// the original relative path started with a / then the
// replacement above would create a //.
return value.Replace(absoluteUrl + "/", absoluteUrl);
Using the method:
<H1>Hello from test Page</H1>
Works great for me so thanks nontalk.com!
code.nontalk.com - Convert Relative Paths to Absolute Using Regular Expressions