select chapter

laiye rpa developer guide d1-凯发网站

in the chapter 3, we have discussed interface elements and how to select them as a target for a command with target. however, we are often unable to accurately identify the desired interface element as the target. therefore, we need to learn commands without target to handle these situations.

when we search for and operate on interface elements, we are actually calling the application programming interfaces (api) provided by the underlying development frameworks of the element to locate them. laiye rpa provides a common abstraction of all these apis to allow users to use all interface elements the same way, without having to worry about the underlying details. however, some software programs do not have an api to identify its interface elements, and some that do choose to hide the api from external use. some examples are:

  • virtual machine and remote desktop

    this includes citrix, vmware, hyper-v, virtualbox, remote desktop (rdp), various android emulators (such as the reliable assistant), etc. these programs are run by a separate operating system, which is completely isolated from the operating system of laiye rpa. naturally, laiye rpa cannot operate interface elements in another operating system.


    it is possible to install laiye rpa and the software used in the processes in the virtual machine or the remote computer. thus, the interfaces provided can be directly used by laiye rpa, since they are still running in the same operating system. in this setup, the local computer only serves as a display.


  • directui-based software

    originally, the development frameworks of windows software interfaces were all provided by microsoft. some examples include mfc, wtl, winform, wpf, etc.. microsoft takes care to provide interfaces for external software programs to interact with the interface elements. however, in recent years, many development teams started launching their own windows development frameworks, collectively known as directui, to make it easier to create slicker uis. the interface elements in these frameworks are all "painted" on the screen. although the interface elements are visible to the users, the operating system and other programs do not know where they are. while some directui frameworks provide external interfaces to find interface elements, many provide no such interface, which makes it impossible for laiye rpa to find their interface elements.


    in fact, the interfaces of laiye rpa creator and laiye rpa worker are developed using a directui framework called electron. while electron does provide an api for finding its interface elements, this api is disabled by default in public releases. therefore, as you may have noticed, laiye rpa’s interface elements cannot be identified by any rpa platform, including laiye rpa itself.


  • video games

    in order to have more control over how every visual element of a video game looks, game developers use frameworks to “paint” the interface elements in their games in a way similar to directui frameworks. this kind of interface usually does not provide an interface to let us know the location of the interface elements. unlike software programs based on directui, interfaces in video games change rapidly and therefore has a high requirement for fast processing. in general, rpa platforms are not optimized for video games and therefore are not suitable for automating video game processes.


    if you want to apply automatic operation on video games, we recommend you use quick macro, which is specifically designed for video games. it has many built-in interface searching methods for video games, such as pixel-level color comparisons, image searching, and much more, and it is optimized to run efficiently.


while we spent chapter 3 introducing commands with target, laiye rpa also provides commands without target. in figure 43, commands with target are enclosed by red boxes, and commands without targets are enclosed by blue boxes.


figure 43: commands with target and commands without target

if you encounter windows programs whose targets you cannot select, then you must resort to commands without target. among the ones listed in figure 43, the most important command without target is simulate mouse move. by supplying a coordinate as a parameter, we can use simulate mouse move to move the cursor to anywhere on the screen represented by the coordinate. then, we can use simulate click to simulate pressing the left mouse button to press a button. or we can use it to set focus on an input box and use input text to enter any text we want.

for example, there is an input box whose center point has coordinates x:200, y:300. we can first use simulate mouse move with coordinates x:200, y:300 to move the cursor to that location. then, we can use simulate click to simulate a left-click to select the textbox. finally, we can invoke input text to input the desired text. otherwise, if we use input text without the first two steps, we are likely going to enter the text somewhere else.

to understand the technical details, we first need to explain the screen coordinate system of the windows operating system. if you are familiar with this, feel free to skip the following.

in windows, each pixel on the screen has a unique 2d coordinate point (x and y). the x coordinate is the number of pixels you need to count from the left of the screen to reach point, starting with 0, and the y coordinate is the number of pixels you need to count from the top of the screen to the point, starting with 0. for example, the coordinate point x:200, y:300 indicates a point that is 200 pixels to the right of the leftmost pixel and 300 pixels below the topmost pixel on the screen. therefore, the coordinate point x:200, y:300 roughly corresponds to the red circle in figure 44.


figure 44: windows screen coordinate system

as long as we have two integer values of x and y, we can know the position of a point on the screen. in laiye rpa, some commands can get the position of a point on the screen and output it to a variable. we will learn later how to use a “dictionary” to store the two x and y values describing a point, but for now, just know that whenever laiye rpa outputs the coordinates of a point, it returns a special object known as a dictionary. assign this object to a variable pnt, and you can use pnt[“x”] and pnt[“y”] to retrieve the x and y values of the point, respectively.

if the interface element we are looking for is in a fixed position on the screen, then we need can supply fixed coordinates to commands with target to simulate normal operations. however, this is rarely the case. windows get moved around and resized all the time, and the coordinate of the interface elements within the windows change as a result as well.

therefore, in laiye rpa, it is generally not recommended to write fixed coordinates directly, because it is difficult to account for all the possible locations an element can be in. when using a command without target, you should use it along with other commands that find and store in variables the coordinates of your intended interface elements through techniques other than feature matching. in laiye rpa, the best partner of a command without target is an image command.

while mouse and keyboard commands are certainly very useful, laiye rpa also provides a powerful category of commands: image commands. in the command area of laiye rpa creator, find the image category, and its member commands are shown in figure 45.


figure 45: commands listed in the image category in laiye rpa creator

let's first look at the find image command, which searches the screen to find areas that match an image. you need to specify an image file (we support .bmp, .png, .jpg, and many other formats, though .png images are recommended for its lossless encoding). then, the command scans the specified area on the screen from left to right and top to bottom to check if the image appears anywhere in it. if it is, the command returns the coordinates of the found image and stores it in a variable. otherwise, the command throws an exception.

this preset command can seem very complicated, since it requires both an image file and a scan area. in fact, if we use laiye rpa creator, using it is a piece of cake.

for example, consider the login interface for steam, a famous video game platform (figure 46). the interface is written using a directui framework. the username and password fields, as well as the login button, cannot be directly selected by any rpa platform using feature matching. this is where image commands play in.

assume that we have launched steam and opened the login interface, and steam has remembered our username and password. all we need to do is to click the login button. create a new process in laiye rpa creator and enter a process block to edit it. insert a find image command and click the find target button on it (figure 47).


figure 46: login interface of steam

similar to a command with target, laiye rpa creator is temporarily hidden. now, press the left mouse button and drag to the bottom right to draw a blue box to enclose the image that we want to find. release the mouse button to confirm your selection.


figure 47: using the find image command

it might seem like what you did was just drawing a blue box, but in fact, laiye rpa creator has already done two things:

  1. it has determined which window the blue frame fell on and recorded the features of this window. when looking for this picture in the future, laiye rpa will find the window first via feature matching and then scan for the picture within the scope of this window.

  2. the part of the screen selected by your blue box is saved as a .png screenshot, and the file is saved in the "res" directory within the process directory. this will be the image used to scan the screen in the future.


left-click the find image command to select it and inspect it properties (figure 48).


figure 48: properties of the find image command

among them, the two circled properties are the most important ones, and they have also been automatically filled by laiye rpa when we boxed the image earlier. other than these two, similarity is a decimal number between 0 and 1 that determines how strictly laiye rpa requires each pixel of the screen to match with each pixel on the search image in order to confirm a match. the higher the number the stricter. the default value is 0.9, which allows for a small amount of mismatch. cursor position determines the coordinates of which pixel of the matched image to return. recall that an image is a rectangle, but the command only outputs the coordinates of a single pixel. therefore, we need to specify which pixel to return. typically, we return the point in the "center." the activate window property determines whether to bring the selected window to the foreground before searching for the image. if the selected window is covered by other windows, even if the image is in the window, the command could fail to find it. therefore, the activate window property defaults to true.

the other properties usually do not need to be changed, and you can just use the default values. for the output to property, a default variable name objpoint is chosen. if the command successfully finds the image, it will save the result in that variable. let’s see what the stored value actually is. insert an output debug information command (found under the basic commands category) after the find image command and set the content parameter as objpoint. take care to not enclose objpoint in double quotes: this would cause laiye rpa to simply output the string "objpoint". see figure 49.


figure 49: view the results through output debug information

assuming that the image we are searching for is found, we obtain the following result after running the process:

{ "x" : 116, "y" : 235 }


the specific value of x and y you see may be different on different computers, but the overall format remains the same. this value is a "dictionary" data type. when this value is stored in the variable objpoint, we can write objpoint["x"] and objpoint["y"] to get the value of x and y.

now that we have obtained the coordinate of the center of the image, we can simulate a mouse click on the location. simply use the simulate mouse move and simulate click commands to complete this step.

as shown in figure 50, the most important properties of simulate mouse move are what point of the screen to move the mouse to. we can simply enter objpoint["x"] and objpoint["y"] for the x and y fields. then, we can add a simulate click command to left-click on the center of the login button. thus we have simulated the action of clicking on the login button.


figure 50: view the results through output debug information

in figure 50, the three commands are easy to follow. even someone who has never used laiye rpa before can understand what they are doing. however, taking three commands to click a login button is too cumbersome. luckily, if you inspect all the image commands again, you can find a click image command, which actually combines find image, simulate mouse move, and simulate click. instead of what we just did, you can simply insert the click image command and use its find target button to box the login button to enable laiye rpa to find and click the button. even though this is a command without target, it is just as easy to use as a command with target.

having learned from this example, you should be able to quickly figure out what the other image commands (like mouse move to image, check image exists, etc.) do and how to use them. let’s leave it to you to try them out.

in chapter 3, we learned to use commands with target, and in this chapter we learned to use commands without target. in most cases, commands without target do not operate on a fixed location on the screen. instead, they are combined with image commands to dynamically identify where to apply the operation on the screen.

therefore, when creating a process, should we prefer to use commands with target or commands without target? our answer is that you should always use commands with target as long as you are able to select the intended interface element as target. the reason for this is because commands without target rely heavily on image commands, which have the following disadvantages:

  • image commands are much slower than commands with target.

  • when the intended interface is covered by other windows—even if only a small part of it is covered—the accuracy of image commands is reduced greatly.

  • image commands often rely on image files and cannot run if the files are lost.

  • some special image commands require internet connection to run.


of course, there are some tips that can help with these shortcomings.

first of all, keep your images small. when taking a screenshot during target selection, try to select the minimal distinguishing parts of the interface element, so long as the selection can show the basic characteristics of the interface element. minimize the selected area. not only does this help the image command run faster, but it also makes occlusion less likely to affect the command’s accuracy. for example, in figure 51, instead of selecting the whole button, like the image on the left, you can select the most important distinguishing aspect, which is simply the text "sign in".


figure 51: choose a smaller area

in addition, most image commands support a similarity property. the initial value of this property is 0.9. if it is set too low, the command could match with incorrect elements; if it is set too high, the command could miss the correct element. this is analogous to the wrong selection and missing selection in commands with target (refer to chapter 3). just as before, we can adjust the value of similarity according to the specific situation and test out different values to choose an optimal value to balance accuracy and specificity.

moreover, the computer’s screen resolution and scales may have a very critical effect on an image command. the interface display of a software program often look completely different on screens with different resolutions, often resulting in image commands that have worked on one computer not working in others. therefore, please ensure that the computer used to create the process has the same resolution and scale as the computers that run the process. figure 52 shows the windows 10 interface for setting system resolution and scale.


figure 52: windows 10 resolution and scale interface

finally, image commands often need to reference image files. while we can certainly provide absolute paths to these image files, such as d:\\l.png, but this requires the computer that runs the process to have the same file in the same location. otherwise, an error would occur. to make image dependencies easier to manage, every process’s directory has a folder named "res", in which you can put images or any other files the process might depend on. then, you can reference these files using strings like @res"1.png". when you send this process to an laiye rpa worker to run it, the files in the "res" directory will also be packaged. no matter where laiye rpa worker puts this process directory, it will always automatically modify what the @res prefix represents, and the file references will always be valid.

all the tips for using image commands also apply to ocr commands. we will introduce what ocr commands are and how to use them in a later chapter.

网站地图