Saturday, March 12, 2011

Making an Article Spinner

Let's make an article spinner. If you don’t know about spinner text then it looks like {john|nikita|mike}. So when you "spin" the text you will get one of the three names. It is generally used for automation of Signup process, Blog commenting etc. Since you can provide many options you don’t have to change or worry about the same content being posted again and again.

I'll build this program in VB.NET and I'll also give a generalized algorithm so that you can implement it in any programming language. Here is the main idea behind the algorithm

                Consider a spinner text "{john|mike} is a {good|bad} boy." This text is capable of generating four new sentences (its obvious). So how are we going to parse this spinner text?

First, we will identify "spin islands" i.e segment in the sentence that starts with '{' and ends with '}'.

                Segments: {john|mike}, {good|bad}

Then for each "spin island" we will remove { and } by string replace functions and we will append each word in the segment (note the words are separated by '|') in an array or where ever and use a random function to return random word stored in the array. We will repeat this process for each segment.

                Remove braces: john|mike good|bad

                For john|mike:

                                extract words which are separated by | and save it in array

                                array[0] = john;

                                array[1] = mike;

                                x = random number between 0 and upper bound of the array

                                return array[x];

This is it! The basic idea behind spinned text parsing. Now lets implement this idea in VB.NET.

Before coding I want you to know that there are two phases in parsing. One is accepting input, detecting segments and another is extracting individual words from the segments. So I'll make two functions for each of those phases. Ok let's start coding.
  • Start a new VB.NET project in Microsoft Visual Studio.
  • On the Form drag two textboxes (txtinput and txtresult) and a button (Button1).
  • Use the following code:
Imports System.Text.RegularExpressions

Public Class Form1


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        txtresult.Text = (spin(txtinput.Text))

    End Sub

    Private Function extractwords(ByVal part As String) As String

        Dim random As Random = New Random()

        Dim values As String() = part.Split("|")
        If (values.Length > 0) Then

            Return values(random.Next(0, values.Length))

        End If
        Return ""

    End Function
    Private Function spin(ByVal input As String) As String

        Dim output As String = input

            Dim r As Regex = New Regex("\{.*?\}", RegexOptions.Singleline)

            Dim m As MatchCollection = r.Matches(input)
            For Each match As Match In m

                Dim part As String = extractwords(match.Value.Replace("{", "").Replace("}", ""))

                output = output.Replace(match.Value, part)

        Catch ex As Exception


        End Try

        Return output

    End Function

End Class
The function spin takes the text to be spinned as input. Then we used regular expression to find out the segments or "spin islands". These segments are stored in 'm' match collection. Now for each segments in 'm' we pass the segment to extractwords function to extract individual words out of it. Then we replace the segment {john|mike|hari} with just a word (hari). We go on replacing all the segments with one of the options until the whole text looks like a regular text J.

My program looks something like this.

I hope you learned some stuff about article spinning. You can make your own article rewriter software or something likethat :D. In the next article, I'll show you how to build something worth hundreds of dollars and one of its important feature will be spinning texts!

 Please note that the above method works only for single level nesting i.e it cannot parse another segment within a segment.

{john|hari|ram} = Valid, will return either john or hari or ram

{john|{ram|shyam}|mike} = Invalid, will return either john or {ram|shyam} or mike


VK Pandey said...

Really helpful to me... Thanks ... and keep it up!!!

Jack said...

Hi Jangedoo,

Thanks you for this detailed code...

I am searching for this code, Greatly I found this page.

I am expecting more programs from your blog...


Free SEO Tools said...

It's not working for nested spinning syntax ... any improvement?

Recent Posts