• Welcome to Tux Reports: Where Penguins Fly. We hope you find the topics varied, interesting, and worthy of your time. Please become a member and join in the discussions.

Saving Web Page Issue

D

David

Flightless Bird
When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also Firefox 3.6)
will create a "_files" directory and store the images, css and js files in
that directory.

However, -- more often than not -- the htm or html source code is NOT
modified to reflect the "_files" directory. It appears to maintain
the original directory on the server for images, css, and js files.

Questions:

1) Is this a function of IE (and/or firefox)?
2) If so can it be corrected and how?
3) If not why, can you please provide an explanation as to why the link in
the html source is not being changed?
4) Is there any setting I can set in XP (pro in this case) that
will correct this?
5) Is there any other solution?

Thanks
David
 
N

Nathan Sokalski

Flightless Bird
The reason it does not modify the *.htm, *.html, or whatever extension it
gets saved with, file is for several reasons:

1. The purpose of the _files directory is primarily for cache, so in the
browser's mind, there is not need to modify the source code.
2. The save function simply copies the file from the location the browser
put it when you viewed it to whatever location you are now specifying; no
parsing is actually done during the save process, and therefore the browser
does not actually look at the text in the file.
3. Because many images, css, and js files are dynamically generated (in the
*.htm, *.html, *.aspx, etc. file it doesn't actually end with *.gif, *.jpg,
*.css, *.js, etc.), looking at the html code might make it hard for the
browser to figure out what to do.

If you really want to have a copy of a page that is completely local to your
machine, I would suggest that you edit the source code yourself, it's not
that hard, especially if you have any basic html editor (or even that hard
without one using Notepad if you've ever even touched html before).
--
Nathan Sokalski
njsokalski@hotmail.com
http://www.nathansokalski.com/

"David" <NoWhere@earthlink.net> wrote in message
news:-OJhqVAtzKHA.4384@TK2MSFTNGP06.phx.gbl...
> When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also Firefox 3.6)
> will create a "_files" directory and store the images, css and js files in
> that directory.
>
> However, -- more often than not -- the htm or html source code is NOT
> modified to reflect the "_files" directory. It appears to maintain
> the original directory on the server for images, css, and js files.
>
> Questions:
>
> 1) Is this a function of IE (and/or firefox)?
> 2) If so can it be corrected and how?
> 3) If not why, can you please provide an explanation as to why the link
> in the html source is not being changed?
> 4) Is there any setting I can set in XP (pro in this case) that
> will correct this?
> 5) Is there any other solution?
>
> Thanks
> David
>
 
D

David

Flightless Bird
Mr. Sokalski:

Thanks for the explanation and your time. Some followup questons if I
may.

1)

If the browser(s) -- IE or Firefox in this case -- is creating the "_files"
directrory as a cache directory
(which makes perfect sense), if the html source does not not include this
reference, how does the browser know where to find the "cached files" -- so
they can be displayed -- when the page is downloaded onto the client
machine?

2)
Why would the source on some webpages contain the "_files" directory
reference and others not
-- unless -- the html downloaded to the client was hard coded this way?

3)
Curious as to where you found this info, as searched everything I could
thing on net and came up with zippo,
even MSDN. Any link or reference to explain -- in detail -- how the
browser handles this would be appreciated?

4)
------------------
> If you really want to have a copy of a page that is completely local to
> your machine, I would suggest that you edit the source code yourself

-----------------

This is what I've been doing. Sometimes an easy fix, other times not.
Will write a parsing routine to automate --if possible, but need to
understand how the browser is handling this first.

David


"Nathan Sokalski" <njsokalski@hotmail.com> wrote in message
news:24DE1E95-7ABB-461C-A924-3D5C4C1E2D2B@microsoft.com...
> The reason it does not modify the *.htm, *.html, or whatever extension it
> gets saved with, file is for several reasons:
>
> 1. The purpose of the _files directory is primarily for cache, so in the
> browser's mind, there is not need to modify the source code.
> 2. The save function simply copies the file from the location the browser
> put it when you viewed it to whatever location you are now specifying; no
> parsing is actually done during the save process, and therefore the
> browser does not actually look at the text in the file.
> 3. Because many images, css, and js files are dynamically generated (in
> the *.htm, *.html, *.aspx, etc. file it doesn't actually end with *.gif,
> *.jpg, *.css, *.js, etc.), looking at the html code might make it hard for
> the browser to figure out what to do.
>
> If you really want to have a copy of a page that is completely local to
> your machine, I would suggest that you edit the source code yourself, it's
> not that hard, especially if you have any basic html editor (or even that
> hard without one using Notepad if you've ever even touched html before).
> --
> Nathan Sokalski
> njsokalski@hotmail.com
> http://www.nathansokalski.com/
>
> "David" <NoWhere@earthlink.net> wrote in message
> news:-OJhqVAtzKHA.4384@TK2MSFTNGP06.phx.gbl...
>> When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also Firefox 3.6)
>> will create a "_files" directory and store the images, css and js files
>> in that directory.
>>
>> However, -- more often than not -- the htm or html source code is NOT
>> modified to reflect the "_files" directory. It appears to maintain
>> the original directory on the server for images, css, and js files.
>>
>> Questions:
>>
>> 1) Is this a function of IE (and/or firefox)?
>> 2) If so can it be corrected and how?
>> 3) If not why, can you please provide an explanation as to why the link
>> in the html source is not being changed?
>> 4) Is there any setting I can set in XP (pro in this case) that
>> will correct this?
>> 5) Is there any other solution?
>>
>> Thanks
>> David
>>
 
D

Donald Anadell

Flightless Bird
"David" <NoWhere@earthlink.net> wrote in message
news:-OJhqVAtzKHA.4384@TK2MSFTNGP06.phx.gbl...
> When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also Firefox 3.6)
> will create a "_files" directory and store the images, css and js files in
> that directory.
>
> However, -- more often than not -- the htm or html source code is NOT
> modified to reflect the "_files" directory. It appears to maintain
> the original directory on the server for images, css, and js files.
>
> Questions:
>
> 1) Is this a function of IE (and/or firefox)?
> 2) If so can it be corrected and how?
> 3) If not why, can you please provide an explanation as to why the link
> in the html source is not being changed?
> 4) Is there any setting I can set in XP (pro in this case) that
> will correct this?


> 5) Is there any other solution?


WinHTTrack

http://www.httrack.com/page/1/en/index.html

"It allows you to download a World Wide Web site from the Internet to a
local directory, building recursively all directories, getting HTML, images,
and other files from the server to your computer.

HTTrack arranges the original site's relative link-structure. Simply open a
page of the "mirrored" website in your browser, and you can browse the site
from link to link, as if you were viewing it online. HTTrack can also update
an existing mirrored site, and resume interrupted downloads. HTTrack is
fully configurable, and has an integrated help system."

Donald Anadell


>
> Thanks
> David
>
 
D

David

Flightless Bird
Thanks

Didn't know there was anything in GPL I could look at.

David

"Donald Anadell" <danadell@nospamersmikrotec.com> wrote in message
news:%23Fpaz0zzKHA.6112@TK2MSFTNGP05.phx.gbl...
>
> "David" <NoWhere@earthlink.net> wrote in message
> news:-OJhqVAtzKHA.4384@TK2MSFTNGP06.phx.gbl...
>> When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also Firefox 3.6)
>> will create a "_files" directory and store the images, css and js files
>> in that directory.
>>
>> However, -- more often than not -- the htm or html source code is NOT
>> modified to reflect the "_files" directory. It appears to maintain
>> the original directory on the server for images, css, and js files.
>>
>> Questions:
>>
>> 1) Is this a function of IE (and/or firefox)?
>> 2) If so can it be corrected and how?
>> 3) If not why, can you please provide an explanation as to why the link
>> in the html source is not being changed?
>> 4) Is there any setting I can set in XP (pro in this case) that
>> will correct this?

>
>> 5) Is there any other solution?

>
> WinHTTrack
>
> http://www.httrack.com/page/1/en/index.html
>
> "It allows you to download a World Wide Web site from the Internet to a
> local directory, building recursively all directories, getting HTML,
> images, and other files from the server to your computer.
>
> HTTrack arranges the original site's relative link-structure. Simply open
> a page of the "mirrored" website in your browser, and you can browse the
> site from link to link, as if you were viewing it online. HTTrack can also
> update an existing mirrored site, and resume interrupted downloads.
> HTTrack is fully configurable, and has an integrated help system."
>
> Donald Anadell
>
>
>>
>> Thanks
>> David
>>

>
>
 
T

Twayne

Flightless Bird
In news:-OJhqVAtzKHA.4384@TK2MSFTNGP06.phx.gbl,
David <NoWhere@earthlink.net> typed:
> When saving a web page, IE 6.0.2900.5512.xpsp_sp3 (and also
> Firefox 3.6) will create a "_files" directory and store the
> images, css and js files in that directory.
>
> However, -- more often than not -- the htm or html source
> code is NOT modified to reflect the "_files" directory. It
> appears to maintain the original directory on the server for
> images, css, and
> js files.
> Questions:
>
> 1) Is this a function of IE (and/or firefox)?
> 2) If so can it be corrected and how?
> 3) If not why, can you please provide an explanation as to
> why the link in the html source is not being changed?
> 4) Is there any setting I can set in XP (pro in this case)
> that will correct this?
> 5) Is there any other solution?
>
> Thanks
> David


Not certain, but the one time I looked at that, I think it
turned out to be whether the original code used Relative or
Direct links. If Relative, they'll work on your computer. If
not relative, then clicking the links or displaying images
etc. might result in an attempt to retrieve them from the
original web site or nothing, depending again on coding sytles
and what you've set your firewall to prevent.

HTH,

Twayne`
 
Top