Understanding Website Statistics

Introduction

Websites frequently publish statistics on their use. This website does so. Website statistics are useful for the construction and maintenance of a website by helping to determine if the website is meeting its objectives. A knowledge of website statistical terms is necessary before the meaning and importance of  the statistical data can be understood. This web page explains website terminology and how to properly interpret the statistics. In addition, some website features are explained with which most Internet users are not familiar but are important to their secure use of the Internet.

Bits, Bytes, Unicode and Packets

Computers and other devices that contain microprocessors work using binary digits called bits. There are only two binary digits: 0 and 1. They can be represented by an electrical current being on (1) or off (0). One can also represent binary digits with a charge being positive or negative. There are other ways to represent binary digits electrically. Regardless of how bits are represented they only have meaning in groups called "strings."

One common string is a group of eight bits, which is called a "byte." All upper and lower case letters in the English alphabet, punctuation marks, numerical digits and special characters (like spaces and new paragraph character) can be uniquely represented by a combination of bits in a byte. Data can be represented and transmitted as a series of bytes. The byte concept is adequate when computing is confined to English language applications. However, one may wish to represent letters from other languages or characters from languages like Chinese. Therefore, the Internet may use a string of up to four bytes to represent one datum. This enhanced representation of alphabets and other data is known as "Unicode."

There are various Unicode schemes. The most commonly used scheme is Unicode Transformation Format (UTF)-8. UTF-8 uses one byte (8 bits) if only English is used. If a character from another alphabet is required, the one byte is coded to indicate that up to three additional bytes are added to represent the required character. However, only one byte is required for most information transmitted over the Internet. UTF-8 is the most efficient scheme when only English is used.

UTF-16 is a scheme that uses a minimum of two bytes (16 bits) and as many as four bytes to represent a character. This scheme is better than UTF-8 when the primary language in use requires at least two bytes to represent a character.

UTF-32 is a scheme that always uses four bytes (32 bits) to represent a character. This scheme is useful for applications that mix many languages or require complex language characters like Chinese.

All data is transmitted over the Internet as a string of  bytes called a "packet." A packet consists of a "header" of up to 40 bytes and a "payload" of up to 64 kilobytes (64X1024 = 65,536 bytes). The header contains address information, identification number and linkage information. The payload contains the data being transmitted be it text, graphics, video or sound. There are usually many packets used during an Internet session.

The Server

A "server" is a computer that physically contains websites. Servers are usually maintained and operated by an Internet Service Provider (ISP). The ISP is said to "host" the websites. A webmaster generates a specific "virtual" website on his/her computer and uploads the website to the server. It is possible to change a website directly on the server but most webmasters prefer to generate and maintain virtual websites that can be tested and modified before making the software available on the Internet.

Associated with the server will be a "packet switching computer" that routes packets to and from the server by the most optimal path.

The Internet Protocol Address

The Internet Protocol Address, usually abbreviated as the "IP Address," is a unique number assigned to a network, device or computer when connected to the Internet. These addresses can be static or dynamic. A static IP address is a number that is permanently assigned to a network, device or computer. This type of IP address would be used for systems that are permanently connected to the Internet and are always active. An example of such a system would be a website server, that is, a computer containing websites.

Dynamic IP addresses are more common. These are temporary addresses that are assigned when the computer, network or other device connects to the Internet and are released when the computer, network or other device disconnects from the Internet.

There are two IP address formats in use: Version 4 (IPv4) and Version 6 (IPv6). IPv4 displays the IP address number in four numerical groups separated by periods, for example, 192.158.2.099. This format permits the unique identification of 4,294,967,296 devices. It has proven to be barely adequate for the current demands on the Internet. About 3.4 billion IP addresses are currently in use. IPv6 displays an IP address in eight numerical groups separated by colons, for example, 2011:ab4:0009:2468:0:765:12:45. This system can represent over 3.4X1038 IP addresses. The numbers between the colons are base 16. In this number system, a is 10, b is 11, c is 12, d is 13, e is 14 and f is 15. Clearly, IPv6 will be adequate for the indefinite future. Currently, IPv6 is used primarily by business applications.

Internet devices are informed as to which IP format to use by the header in a packet. Each packet also contains the IP address and information required to assemble packets into useful information.

IP addresses can be theoretically traced to a unique computer. This trace becomes more difficult if the address is dynamic and the computer is disconnected from the Internet. However, Internet providers keep a log of connections and traffic addresses, which facilitates a trace. Also, a screen name is logged if a user has logged onto the Internet with a screen name as when one checks email. If you wish to minimize your chance of being uniquely identified when using the Internet, log on anonymously and use the privacy option of your browser.

Server Logs

Servers keep a detailed record of information requested from a website. These logs are normally kept on a monthly basis, that is, data is stored in a file designated for a specific month and year. The stored data includes the IP address of a user, called a visitor, accessing the site. The visitor's screen name is also recorded if it is available. The name of each page or file that the visitor accesses is recorded with the date and time the information was viewed. The data in these logs are used by a program on the server to generate statistical reports. The logs may also be downloaded by the webmaster to determine which pages or files are frequently accessed. The webmaster can then adjust the content of the website to better accomplish the website's objectives. One month's log can be hundreds of pages of small print. Therefore, logs are seldom downloaded. The statistical reports are usually requested by the webmaster on a regular basis.

Statistical Reports

Statistical reports may vary from one ISP to another but are usually generated on a monthly basis and will include information on the number of hits, number of files downloaded, number of pages read, number of sites making requests, number of visits to the site and the number of kilobytes downloaded during the month. In addition, the statistical report may contain information like the visitor's country of origin and type of visitor (person, web crawler, business etc.). This data may be encoded in Unicode and include detailed IP address information.

A "hit" is any request made to the server including requests for files, pages, videos etc. Many websites like to trumpet the hits on the site with a hit counter. However, this number can be misleading because web crawlers (automated computer programs that patrol the web looking for new information) hit on items in a random manner unlike people who are looking for specific information. Hits are, in the opinion of the author, a poor indication of the utility of a website.

A "file" is a data transfer from the website in response to a hit.

A "page" is a HyperText Markup Language (HTML) document. Video clips, audio files, images etc. are not pages. This descriptive text is an example of a page.

A "site" is a unique IP address.

A "visit" is a request from a specific IP address to access the website. Only one request to access the site from the same IP address is recorded over a specified period of time. For the website you are viewing, this period of time is one-half hour. If a visitor signs off this website and signs in again within half an hour, the server logs only one visit. No matter how many times a visitor may sign in and off the website in half an hour, the server logs all these visits as only one visit.

A "kilobyte" is 1,024 bytes. Most often, a byte is one character of information like a letter of the English alphabet. One kilobyte is 1,024 characters transmitted over the Internet by the website. Statistical reports total the number of kilobytes transmitted by the website over the Internet in one month or another period of time. The number of kilobytes of information transmitted by the server each month is important in two ways. First, this statistic indicates the amount of data from the website that people are requesting. Second, it is an indication as to the bandwidth the website requires. Bandwidth is the total amount of kilobytes that a website is permitted to transmit in one month. If bandwidth is exceeded, the website becomes inactive for the duration of the month. The monthly cost of a website is partly driven by the bandwidth required. The contract between the website owner and host ISP specifies the bandwidth. If more bandwidth is needed beyond that which is contracted, the cost of hosting the website will increase.

Summary

The website servers keep detailed information on the use of websites in logs. These logs are used by webmasters to determine the information on a website that is most viewed. In addition, programs on the servers can generate statistical reports on the use of a website. Users of the Internet should be aware that information on their access to websites is being logged and used by webmasters to improve the websites. This data may also be used to identify visitors.

7 Return to Home Page   ª