Load SharePoint 2007 Usage Log Data Using SSIS
30. April 2008 16:38

It seems everyone is interested in gathering usage data from MOSS, and everyone is unsatisfied with the simple reports that can be generated out-of-the-box.  Yet oddly enough, I was unable to find much information at all about ETL and Reporting on this data.  This is probably due to the fact that the log file format is a bit cryptic, and was entirely undocumented until recently.

What I've created here is an SQL Integration Services ETL Package that will enumerate the usage log files from disk, parse them, and load them to SQL.  Then it will archive or delete the files.  From there you can easily create SQL Reporting Services reports, cubes, etc to further analyze the data.

Here is the MSDN documentation on the Usage Event Logs file format:

http://msdn.microsoft.com/en-us/library/bb814929.aspx

Before we get too deep, take a look at my Visual Studio solution so you have some context:

image

You can see I've got the DTSX package, as well as a C# class library and a table creation script.

Because the files are in a binary format, you can't parse them using a simple 'flat file' source in SSIS.  Initially I hacked together a quick script source task which did a reasonable job, but then I found some nice C# code for parsing the files at William's Blog.

I took William's C# code, with very few changes, and created my class library to do the parsing.  I will later reference this library from inside a script source task in SSIS to parse the log files.  (Writing an external dll and referencing it in a script task is *much* faster than writing a custom source adapter for SSIS)

Let's start at the beginning though, so here is what the main package loop looks like:

image

Very simple.  The log files are created in a bit of a nested folder structure, with GUID's and dates.  They all have a .log extension (the same as other MOSS logs).  The best thing to do is go to central admin and tell it to shove all the usage logs in a nice subfolder called 'Usage' so you can easily differentiate them from all the other log files.  Then we just tell the enumerator to traverse sub folders and it will find all our files.

Each file that is found has it's path stored in a variable, and the data flow task is called off.

When the data flow task completes, we run a file system task which archives the log files off elsewhere.  (Or with a small change simply deletes them).   All of the paths used here are stored in variables.

And the data flow task:

image

Once again, very simple.  The script source task has the 'meat and potatoes' in it.  It pushes out all the parsed data where we simply dump it into a SQL table.  I take care of type casting in the script task so we don't have to do it here.  You could of course add some logic to truncate old records out of the table, etc.

So, on to the script task.

The outputs (and thus the columns available in the log files) for the script task are configured as follows:

image

Also, we need to bring in our filename variable from the enumerator so we know which file to parse:

image

Next, we've got to get our custom DLL (from our class library project in the solution) somewhere that we can access it from our script task.  You've got to copy the parser DLL to this folder:

C:\Program Files\Microsoft SQL Server\90\SDK\Assemblies

Doing this makes the DLL available for reference from within your script task.  Putting the DLL in the GAC won't work.

Now you can add it as a reference, you can see "UsageLogParser" added to my script references here:

image

Here is the full code listing for the main subroutine in my script source task.  This uses the parser class to generate a dataset, then enumerates the rows in the dataset and add's rows to the source adapters outputs.  The only thing missing from this code listing is a line in PreExecute to pull my filename variable into a local string (path).  You could expand on this and trap errors, logging them to an error output. 

    Public Overrides Sub CreateNewOutputRows()
If (System.IO.File.Exists(path)) Then

Dim parser As New MOSS2007LogParser.Parser()
Dim result As DataSet = parser.GetLogDataSet(path)

For Each row As System.Data.DataRow In result.Tables(0).Rows
LogDataOutputBuffer.AddRow()
LogDataOutputBuffer.id = New Guid(row("SiteGUID").ToString())
LogDataOutputBuffer.time = DateTime.Parse(row("TimeStamp").ToString())
LogDataOutputBuffer.url = row("Document").ToString()
LogDataOutputBuffer.sitecollection = row("Web").ToString()
LogDataOutputBuffer.user = row("UserName").ToString()
LogDataOutputBuffer.webapp = row("SiteUrl").ToString()
LogDataOutputBuffer.referral = row("Referral").ToString()
LogDataOutputBuffer.command = row("Command").ToString()
LogDataOutputBuffer.useragent = row("UserAgent").ToString()
LogDataOutputBuffer.querystring = row("QueryString").ToString()
Next

result.Dispose()

End If
End Sub


After that, we simply connect the output to our destination adapter.  Here is what my destination table looks like.  I'm just using the GUID as a primary key and bouncing any duplicates (not that there should be any if you archive/delete the logs).  You can add smarter logic if you want to handle that differently.  The sql script for the table is also in the solution.



USE [BGSD]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[UsageLog](
[id] [uniqueidentifier] NULL,
[time] [datetime] NULL,
[webapp] [varchar](50) NULL,
[sitecollection] [varchar](50) NULL,
[url] [varchar](100) NULL,
[user] [varchar](50) NULL,
[referral] [varchar](150) NULL,
[command] [varchar](100) NULL,
[useragent] [varchar](150) NULL,
[querystring] [varchar](150) NULL
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF


A few additional notes:




  • You need to enable both usage logging, and usage log processing in Central Administration.  You should make a new folder to save these logs to so that your package can tell them apart from other .log files.


  • You do not need to enable Usage report generation in your site collections.


  • You should schedule your package to run outside of the time slot for usage processing that you configured in central administration.  This way the log files are not in use while you load them, and you don't have to worry about remembering where you left off, etc.


  • There is some performance impact to usage logging.  If you have previously had this disabled, please pay close attention to any adverse effects.



Download the Visual Studio Solution and DTSX package here.  This was created using Visual Studio 2005, and SQL Server Integration Services 2005.

Tags: Comments (8) | Permalink
Uninstall MOSS 2007
19. April 2008 12:27

It seems to be pretty common for people to get stumped in trying to remove their SharePoint Server 2007 installation. 

This is in no small part due to the fact that the uninstaller simply quits without any message unless you ensure a few things ahead of time.

1.  Remove Server From Farm

In Central Admin, Operations tab, Server in farm -- Remove server.  You'll probably get an error message after you click remove, since the server isn't able to render the page anymore.

2.  Remove Databases

Content DB, Config, Search, SSP, Etc.

3.  Remove Web Applications

Use the IIS Management Console, remove all web applications related to the MOSS site.

Now you can remove MOSS via add/remove programs.

Tags: Comments (11) | Permalink
Moving SharePoint Content Database Files To New Location
3. April 2008 06:26

Often you need to relocate your MOSS database files to a new partition, drive, or SAN, while leaving them attached to the same SQL Database Instance.

If you are trying to figure out how to move the content database to a different SQL instance or server then you are looking for this article.

Today my colleague Paul Curtis sent me the steps he uses to perform this task in a rather efficient way.

1. Prepare SQL Script

Alter the database file names using the scripts listed down below.  Modify the scripts as appropriate for each database.

If the database has dashes in the name you have to put quotes around it.

2. Run the Script

Run the script in SQL Management Studio, or Management Studio Express.  You should get a result indicating that the paths have been altered for the databases.

3. Stop SharePoint and SQL Server

Stop IIS, or the web applications your SharePoint site.  Stop MSSQL Server through Management Studio or via the Windows Service.

4. Move the databases to the new location

Copy the mdf and log files to the new location (new drive, partition, etc).

Please copy and then delete, as opposed to trusting a cut+paste operation.

5. Start the SQL services

Start MSSQL either through Management Studio or via the Windows Service.

 

6. Verify access to each database

Verify the databases are working properly, you can just pull up the properties to the database in Management Studio.

7. Start SharePoint services

Start IIS, web apps, or whatever you disabled in step 3.

Go verify your SharePoint site is working.

8. You are Done.

Go get coffee, pat yourself on the back, etc.

 

The SQL Script:

ALTER DATABASE SP_WSS_SEARCH_DB

MODIFY FILE (NAME = SP_WSS_SEARCH_DB_log ,FILENAME = 'D:\ Data\SP_WSS_SEARCH_DB_log.LDF')

GO



ALTER DATABASE SharePoint_Configuration_DB

MODIFY FILE (NAME = SharePoint_Configuration_DB_log ,FILENAME = 'D: \Data\SharePoint_Configuration_DB_log.LDF')

GO



ALTER DATABASE "SharePoint_AdminContent_ce805cd0-0b6d-4ea1-8eaa-469b1ea0296a"

MODIFY FILE (NAME = 'SharePoint_AdminContent_ce805cd0-0b6d-4ea1-8eaa-469b1ea0296a_log' ,FILENAME = 'D: Data\SharePoint_AdminContent_ce805cd0-0b6d-4ea1-8eaa-469b1ea0296a_log.LDF')

GO



Keep in mind that moving the database may cause SharePoint to perform a full crawl in place of a regularly scheduled incremental crawl at the next crawl interval.

 

Go here to get more information from MSDN.

Tags: Comments (11) | Permalink