Help me to understand this URL format please

ahmad_abdulghany · Apr 17, 2007

There's certain web-site from which i want to download large number of files..
I couldn't use 'download all' function from any download program as i have to follow links for each file to get to its own download page.

I noticed that all download links for files take regular form like:

http://audio.mywebsite/getFile.php?fileId=459

--> This is what firefox displayed as the source of download file.. The number 459 is incremented in regular way with constant step.. so i can easily generate a text file containing list of all other files links (similar to this one in format)

The address from which i launch downloading a file takes the form:

http://audio.mywebsite/download.php?fileId=459

--> That's displayed in the URL address bar.

The problem is that number of files i would like to download is huge, and it's really tiring to open each page and download one by one..
I already generated (using Matlab) a list of download locations (in the form above) but when i imported it to DAP, it couldn't understand or download it probably!!!

I hope that anyone can help me...

BTW, files extensions are MP3, and when i download them, they take names that are not existing in the download address shown above !!!!!

Thanks in advance,
Ahmad

shg · Apr 18, 2007

Those files are served by a PHP script, which checks whether You have clicked the link from the web site, or not. It's a protection from hot-linking the files to another web site.
The mechanism probably uses a HTTP header 'Referer' field, which includes the address of the site on which that link was found.

All you have to do is to pass a propper referer address with each http request, It's quite easy from programmers point of view, when one have already wrote a HTTP protocol component/library/whatever. Unfortunately it's a bit more problematic when You need to download such a file using existing software.

One way to do it is to use wget with referer option.

--referer=url
Include `Referer: url' header in HTTP request. Useful for retrieving documents with
server-side processing that assume they are always being retrieved by interactive web
browsers and only come out properly when Referer is set to one of the pages that point
to them.

Write a script containing the wget command and name of file which You want to download with an --referer option having site with link to that file as an argument in each line.
The server-side script can be possibly so 'silly', that it would be enough to provide just host addres as a referer.
for example:

Code:

wget --referer="h**p://audio.mywebsite/" "h**p://audio.mywebsite/getFile.php?fileId=459"

If it does't work try to put complete URL/URI of the referer site, and then maybe find a scheme, how it depends on the URI of file You want to download.

Ofcourse more extensive methods of verifying whether a user tries to do a mass-download or hot-linking exisit (limiting a number of downloads per session/perod of time, cookies, tracking http server requests), but the referer field is most commonly seen. However wget provides You a variety of options to choose from, it's almost impossible to write a hot-linking prevention script which could not be fooled by wget.

Welcome to EDAboard.com

Help me to understand this URL format please

ahmad_abdulghany

Advanced Member level 4

shg

Newbie level 6

ahmad_abdulghany

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics