Some notes on software development... RSS 2.0
# Tuesday, 27 May 2008

If you save a HTML page to a text file, or as a string in memory, you are likely to get the relative path contained in any IMG or HREF tags

Example snippet:

<H1>Hello from Test Page</H1>
<IMG src="images/TestImage1.jpg">

When you then load you saved HTML page you will not see the image because the HTML is looking for TestImage1.jpg in the images folder which doesn't exist. All that exists is your saved HTML text file.
So we need to parse the HTML and prefix the missing server path to the src tag in the HTML.

The most efficient way to achieve this is to use the power of Regular Expressions, but I'm no expert with RegEx's so after trawling the Internet looking for a suitable RegEx example, rather than read a book ;), I finally found the correct expression at

The pattern: "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>"
The Match Evaluator: "<$1$2=\"" + absoluteUrl + "$3\"$4>"

The example method:

public static String ConvertRelativePathsToAbsolute(String text, String absoluteUrl)
    String value = Regex.Replace(text, "<(.*?)(src|href)=\"(?!http)(.*?)\"(.*?)>", "<$1$2=\"" + absoluteUrl + "$3\"$4>",
                                 RegexOptions.IgnoreCase | RegexOptions.Multiline);

    // Now just make sure that there isn't a // because if
    // the original relative path started with a / then the
    // replacement above would create a //.

    return value.Replace(absoluteUrl + "/", absoluteUrl);

Using the method:

ConvertRelativePathsToAbsolute(myHTML, "http://localhost/")

Will return:

<H1>Hello from test Page</H1>
<IMG src="http://localhost/images/TestImage1.jpg">

Works great for me so thanks!

Links - Convert Relative Paths to Absolute Using Regular Expressions

Tuesday, 27 May 2008 18:27:08 (GMT Standard Time, UTC+00:00)  #    -
C# | Regular Expressions

Sign In
Total Posts: 116
This Year: 0
This Month: 0
This Week: 0
Comments: 36
About the author/Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2019
Hadrian Phillips

All Content © 2019, Hadrian Phillips
DasBlog theme 'Business' created by Christoph De Baene (delarou)