Image-Based Game Automation in Python🎮

git repo: https://github.com/StanLinCat/auto_dino

1. Project Introduction and Motivation (Introduction and Motivation)

This project practically explores how to develop a Python game automation tool using basic image recognition techniques. It can serve as an interesting tool and be applied in many scenarios.

2. Core Technology Stack (Core Technology Stack)

This project utilizes the following powerful open-source libraries, which form the foundation for image recognition and automated control:

Technology	Brief Description
OpenCV	Open-source computer vision library for image processing and pattern recognition.
Python	Easy-to-learn, high-level programming language with rich libraries, ideal for beginners.
pyautoGUI	Library for automating mouse/keyboard control and basic screenshot/image matching.
NumPy	Library for efficient numerical computing, used here for image matrix operations.

💡 Key Concept: The underlying principle of image processing involves mathematical concepts, primarily matrices (from linear algebra). Combined with statistical data analysis, it can solve basic image recognition problems and further apply to machine learning and deep learning.

Original image of the like button (left), image information matrix (middle, right)

3. Image Recognition Method: Template Matching (Template Matching)

The core image recognition method in this project uses the cv2.matchTemplate function from the OpenCV library.

3.1 Algorithm Principle

The main function of cv2.matchTemplate is to find similar targets in an image.

Input: The algorithm takes two images: image (large image, i.e., screenshot) and template (small target image to search for).
Computation: The program slides the template over the image continuously and calculates a comparison value at each position, representing the similarity of the two images in that region.
Output and Positioning: The final result is a result image storing the comparison values. Then, use the minMaxLoc function to find the maximum or minimum value in the result image to locate the found target position.

❗ Positioning Note: The point found by the function is the top-left corner of the target image. If used as a click position in game automation, add half the width and height of the template to the coordinates to ensure clicking the effective area.

Small dog face on top-left is the template matrix, large image on left is the image, right is the result matrix

3.2 Similarity Comparison Functions and Mathematical Formulas

OpenCV provides multiple methods for calculating mathematical comparison values. This report introduces normalized formulas after averaging. These methods ensure similarity remains unchanged when pixel brightness is multiplied by the same coefficient.

(1) Squared Difference (CV_TM_SQDIFF_NORMED)

This method calculates the squared difference and normalizes it. Smaller values indicate higher similarity.

\[R(x, y) = \frac{∑_{x', y'}(T(x', y') − I(x + x', y + y'))^2}{\sqrt{∑_{x', y'}T(x', y')^2 \cdot ∑_{x', y'}I(x + x', y + y')^2}}\]

(2) Normalized Correlation Coefficient (CV_TM_CCORR_NORMED)

This method calculates the correlation coefficient and normalizes it. Larger values indicate higher similarity.

\[R(x, y) = \frac{∑_{x', y'}(T(x', y') \cdot I(x + x', y + y'))}{\sqrt{∑_{x', y'}T(x', y')^2 \cdot ∑_{x', y'}I(x + x', y + y')^2}}\]

(3) Normalized Correlation Coefficient Removing DC Component (CV_TM_CCOEFF_NORMED)

This method also calculates the correlation coefficient but subtracts the mean during computation, effectively avoiding misjudgments due to overly large image values. The resulting correlation coefficient is bounded between -1 and 1. Larger values indicate higher similarity.

\[R(x, y) = \frac{∑_{x', y'}(T'(x', y') \cdot I'(x + x', y + y'))}{\sqrt{∑_{x', y'}T'(x', y')^2 \cdot ∑_{x', y'}I'(x + x', y + y')^2}}\]

Where $T’(x’, y’)$ and $I’(x + x’, y + y’)$ represent the original matrices minus their means.

$T’(x’, y’) = T(x’, y’) − \frac{1}{w \cdot h} \sum_{x’’, y’’} T(x’’, y’’)$ $I’(x + x’, y + y’) = I(x + x’, y + y’) − \frac{1}{w \cdot h} \sum_{x’’, y’’} I(x + x’’, y + y’’)$

3.3 Limitations of Basic Functionality

Using cv2.matchTemplate alone has certain limitations: it is suitable for general 2D games, but recognition difficulty is very high for 3D games. Additionally, the template in the screenshot image cannot be rotated.

4. Game Automation Program Flow (Program Flow)

The automation flow of the game bot relies on image recognition results and uses if and else logic to determine the next mouse or keyboard control action. The flow is mainly divided into three steps:

Prepare Game Environment: Switch the interface to the game screen and add refresh functionality to wait for the game to officially start.
Start Game: Use OpenCV to find the coordinates of the “Start Button”, control the mouse to click it, and officially enter the game.
Automatically Play Game: Start timing (recommend adding a timer to prevent program runaway), search for target images via image recognition, and execute preset logic based on recognition results to control keyboard actions (e.g., left click or jump).

graph TD
    A([Start]) --> B(Prepare Environment);
    
    B --> C{Find Start Button};
    C -- No, Continue Waiting --> B;
    C -- Yes --> D(Click to Start Game);
    
    D --> E[Start Countdown];
    D --> F[Play Game];
    
    E --> F;
    
    F --> G{Recognize Game Image};
    
    G -- Recognition Result 1 --> H[Left Click or Short Jump];
    G -- Recognition Result 2 --> I[Right Click or Long Jump];
    
    H --> F;
    I --> F;
    
    F --> J([End]);

    %% Light red area (Start/End/Countdown) - Acceptable in light/dark modes
    style A fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff
    style J fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff
    style E fill:#ff8787,stroke:#333,stroke-width:2px,color:#fff

    %% Light yellow/green area (Main Flow) - Changed to softer teal series
    style B fill:#4ecdc4,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#4ecdc4,stroke:#333,stroke-width:2px,color:#fff
    style F fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff
    style H fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff
    style I fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff

    %% Orange area (Decision Points) - Changed to softer orange
    style C fill:#ffa07a,stroke:#333,stroke-width:2px,color:#fff
    style G fill:#ff8c6b,stroke:#333,stroke-width:2px,color:#fff

5. Practical Results and Discussion (Practical Results and Discussion)

5.1 Practical Results in Drink or Cake Recognition Game

In testing a game involving recognizing drinks or cakes, the program scans screenshots to find and click the game start button coordinates.

Performance Optimization: During execution, set the Region of Interest (ROI) to the middle table area, which significantly reduces computation time.
Performance: The program runs very fast, with overall recognition time less than 0.5 seconds. Since execution speed exceeds human limits, the program can “feed” the rabbit or cat to the max within time limits, even with 8 seconds remaining at the end.

5.2 Challenges and Optimization Suggestions for Chrome Dinosaur Game

The Chrome Dinosaur Game is more difficult because game speed continuously increases over time.

Challenges: As speed increases, the preset jump reaction position reaches its limit, causing crashes into cacti due to insufficient reaction time.
Parameter Complexity: This requires the algorithm to consider the relationship between ROI and speed, possibly needing a global variable to increase the reaction interval over time. It also needs to account for pterodactyls, sharp descents, long jumps, or short jumps, making parameter adjustment complex.

Suggested future optimization: Adopt “Sequential Control”: Set dedicated modes for different game periods (e.g., day/night, different acceleration stages). This reduces the number of images processed per computation and makes ROI adjustments easier and more precise.

5.3 Performance Improvement Solutions

For games requiring extremely high speed, consider the following optimizations:

Convert Python program functionality to C++ or other languages.
Monitor computation time to more effectively find suitable parameters.
Upgrade to faster computer hardware, considering GPU and memory issues.

6. Extended Works

Hearthstone Automation Bot

Mobile Game Automation: Shining Nikki Auto Claim Rewards, Cats & Soup Automation

Share on

Facebook LinkedIn X

Stan Lin