SEOMining SEOMining


Internet Networks  «Prev 

URL Components

URL consisting of 1) http 2) Delimeter, 3) subdomain name 4) domain name 5) directory

http Service or protocol. Type of Internet service the browser uses to access the resource.
//. Delimiter
www Optional host name for Web resources on HTTP servers.
sports. The subdomain name. This indicates the specific section of the domain name Web site.
yahoo.com The domain name. This will take you to the home page of the site.
/nba/ directory - location of the resource on the host server
/standings.html/ The filename of the specific web page document.

Key Web Technologies: URLS, HTTP, and HTML

The key elements to the development of the Web included an Internet protocol to specify the client-server interaction, a way to identify the location of a resource, and a way to create content not dependent on a proprietary format, which was the HTML format.
These technologies of the Web are explored further at the following course. Internet Protocols
Initially, Berners-Lee (1999) proposed a URI, a Uniform Resource Identifier (URI), as the way to identify a resource, but the IETF standards body changed this to the now familiar URL, the Uniform Resource Locator. URLs have a standard syntax. They begin with a protocol followed by a colon and a double forward slash, followed by a Fully Qualified Domain Name (FQDN, i.e., a host at some domain), followed by the directory path to a resource ending with the full file name.
Figure 3.3 shows the structure of a sample standard URL. Besides the usual HTTP that start the standard URL, they can also accommodate the other Internet protocols that were prevalent at the time of its development, for instance a URL could designate "gopher" or "TN3270" as the protocol in place of the HTTP.
So in addition to http://hostname, "tn3270:// hostname" or "gopher://hostname" could be valid URLs as well.

Figure 3.3 The format of a standard URL.

Filenames are not always required elements to end a URL because several default names are sought by a server when the path ends with a machine or directory name; welcome.html or index.html is sought by the server without being specified in the URL. URLs frequently point to resources on Web servers running on UNIX machines, making case sensitivity a potential issue; however, most browsers have a "smart browsing" feature to overcome this problem. Long URLs can be shortened with the creation of an alias name and a subsequent HTTP redirect; for instance, in the URL shown in Figure 2.2, the CommInfoStudies portion can also be replaced with the abbreviation "CIS".
Alternatively, services such as TinyURL.com can be employed to provide shortened URL aliases and redirects.
However, when URLs appear in HTML the match must be exact; a link reference within HTML href code that did not use the mixed case in the CommInfoStudies portion of the reference could result in a 404 page not found error in some browsers.